BAYESIAN  ANALYSIS  OF  ITEM  RESPONSE  MODELS  FOR  BINARY  DATA 


By 

ATALANTA  GHOSH 


A  DISSERTATION  PRESENTED  TO  THE  GRADUATE  SCHOOL 
OF  THE  UNIVERSITY  OF  FLORIDA  IN  PARTIAL  FULFILLMENT 
OF  THE  REQUIREMENTS  FOR  THE  DEGREE  OF 
DOCTOR  OF  PHILOSOPHY 

UNIVERSITY  OF  FLORIDA 


1996 


To  my  parents  and  teachers 


ACKNOWLEDGEMENTS 


I  would  like  to  express  my  sincere  gratitude  to  Professors  Malay  Ghosh  and  Alan 
Agresti  for  being  my  advisors.  Without  their  enormous  patience,  encouragement  and 
guidance,  it  would  not  have  been  possible  to  complete  the  work.  I  consider  myself 
extremely  lucky  to  get  them  as  my  dissertation  advisors.  I  would  like  to  thank  my 
friends  for  their  great  help,  especially  during  the  final  stage  of  this  dissertation.  Also 
I  would  like  to  wholeheartedly  thank  my  wife  Sofia  Paul  for  her  invaluable  support 
and  help  all  throughout  the  work. 


iii 


TABLE  OF  CONTENTS 


ACKNOWLEDGEMENTS   iii 

ABSTRACT   vi 

CHAPTERS 

1  INTRODUCTION   1 

1.1  Literature  Review   1 

1.2  Topic  of  this  Dissertation   6 

2  A  UNIFIED  BAYESIAN  ANALYSIS  OF  ITEM  RESPONSE  MODELS  FOR 
BINARY  DATA   8 

2.1  Introduction   8 

2.2  Choice  Of  Priors    9 

2.3  Implementation  Of  Bayes  Procedures   23 

2.4  Analysis  Based  on  Data  For  Placement  Tests   26 

2.5  Example  on  Matched  Pairs  Data   31 

2.5.1    Effect  of  Diagonal  Elements   39 

2.6  Uniform  approximation  of  improper  priors  by  proper  priors   44 

2.7  Consistency  of  Marginal  Maximum  Likelihood  Estimator   57 

3  HIERARCHICAL  BAYESIAN  ANALYSIS  OF  ITEM  RESPONSE  MODELS 
FOR  BINARY  DATA    75 

3.1  Introduction   75 

3.2  Bayes  Procedures  with  Hierarchical  Priors   76 

3.3  Implementation  Of  Bayes  Procedures   81 

3.4  An  Example   82 

4  A  UNIFORM  BAYESIAN  ANALYSIS  OF  TWO-PARAMETER  ITEM  RE- 
SPONSE MODELS  FOR  BINARY  DATA   86 

4.1    Introduction   86 

iv 


4.2  Choice  Of  Priors    87 

4.3  Implementation  Of  Bayes  Procedures   94 

4.4  An  Example   96 

5    SUMMARY  AND  FUTURE  RESEARCH   101 

BIBLIOGRAPHY   104 

BIOGRAPHICAL  SKETCH   108 


v 


Abstract  of  Dissertation  Presented  to  the  Graduate  School 
of  the  University  of  Florida  in  Partial  Fulfillment 
of  the  Requirements  for  the  Degree  of 
Doctor  of  Philosophy 


BAYESIAN  ANALYSIS  OF  ITEM  RESPONSE  MODELS  FOR  BINARY  DATA 


By 


Atalanta  Ghosh 


December  1996 

Chairpersons:  Alan  Agresti   Malay  Ghosh 
Major  Department:  Statistics 

We  present  a  unified  Bayesian  approach  for  the  analysis  of  one-  and  two-parameter 
item  response  models,  with  special  emphasis  on  logit,  probit,  and  log-log  links.  Nec- 
essary and  sufficient  conditions  are  found  for  the  propriety  of  the  posteriors  under 
improper  priors  for  these  models.  Bayes  estimation  is  implemented  using  Markov 
Chain  Monte  Carlo  integration  technique,  and  is  illustrated  with  an  example  from 
educational  testing.  Some  relationships  between  the  frequentist  and  the  Bayesian 
procedure  are  also  discussed. 


vi 


CHAPTER  1 


INTRODUCTION 

1.1    Literature  Review 

Item  response  models  are  widely  used  for  the  analysis  of  psychometric  data.  Their 
origin  can  be  traced  back  to  mid-thirties  and  early  forties  (cf.  Richardson,  1936; 
Tucker,  1946).  A  systematic  development  of  item  response  theory  from  the  classical 
point  of  view  owes  much  to  the  pioneering  work  of  Lord  (1952,  1953a,  1953b),  Rasch 
(1960,  1961)  and  their  associates.  Among  the  many  noteworthy  recent  contributions 
in  the  same  vein,  we  may  cite  Andersen  (1970,  1972,  1973),  Bock  and  Lieberman 
(1970),  Mislevy  and  Bock  (1984),  Swaminathan  and  Gifford  (1981),  and  Hambleton 
and  Swaminathan  (1985). 

To  motivate  item  response  models,  consider  as  a  specific  example,  ability  tests  or 
attitude  tests  where  each  individual  answers  a  battery  of  questions.  Let  X{j  denote 
the  response  of  the  ith  individual  to  the  jth  question  (i  =  1, . . . ,  n  ;  j  =  1, . . . ,  k). 
Associated  with  the  zth  individual  (or  subject)  is  a  subject  parameter  ^  that  expresses 
the  capacity,  ability  or  the  attitude  of  that  individual  in  a  given  context.  However, 
the  distributions  of  the  will  depend  not  only  on  Q{  but  also  on  some  parameter  aj 
where  -aij  represents  the  difficulty  level  of  question  j.  Item  response  analysis  models 
the  distribution  of  Xtj  taking  into  account  both  Q{  and  a, (i  =  1, . . .  ,n  ;  j  =  1, . . . ,  Jb). 

For  simplicity,  throughout  this  dissertation  we  consider  only  the  case  when  the  Xij 
are  binary  random  variables.  This  is,  for  example,  the  situation  when  n  examinees  are 


1 


2 


answering  "True/False"  questions.  Alternately,  the  examinees  may  answer  multiple 
choice  questions  where  each  answer  is  coded  as  "correct"  or  "incorrect".  Let  pjj  = 
P{Xij  =  1)  (i  =  1, . . . ,  n  ;  j  =  1, . . . ,  k),  where  1  denotes  a  correct  response. 
Item  response  theory  for  binary  response  is  based  on  models  which  express  the  ptj  as 
functions  of  certain  parameters. 

This  dissertation  concentrates  on  one-  and  two-parameter  item  response  models. 
A  one-parameter  item  response  model  in  its  most  general  form  is  given  by 

pij  =  F{6i  +  ai),  (1.1.1) 

where  F  is  a  distribution  function.  This  is  referred  to  as  a  one-parameter  item 
response  model,  since  as  a  function  of  9{  the  right  hand  side  of  (1.1.1)  has  the  form  of 
a  distribution  function  with  location  parameter  -cty  The  function  F"1  (or  sometimes 
F  itself)  is  called  a  link  function.  The  three  most  commonly  used  link  functions  are 
the  logit,  probit  and  log-log  links.  Logit  and  probit  links  are  symmetric,  while  the 
log-log  link  is  asymmetric.  A  logit  link  is  based  on  the  logistic  distribution  function, 
namely 

The  resulting  one-parameter  item  response  model  is  the  celebrated  Rasch  model 
(Rasch,  1960,  1961).  A  probit  link  is  based  on  the  normal  distribution  function. 
A  log-log  link  is  based  on  the  extreme  value  distribution  function. 

In  the  item  response  literature,  the  0,  are  referred  to  as  "subject  ability"  param- 
eters, while  the  aj  are  referred  to  as  "item"  parameters.  Despite  the  simplicity  of 
one-parameter  item  response  models  and  in  particular,  the  overwhelming  popularity 
of  the  Rasch  model  (see  for  example  the  collection  of  essays  in  Fischer  and  Mole- 
naar,  1995),  these  models  are  often  criticized  on  the  ground  that  they  assume  all  the 
questions  to  have  equal  power  to  discriminate  between  "good"  and  "poor"  students, 
and  therefore  do  not  involve  any  discrimination  parameters  in  addition  to  the  item 


3 


parameters.  This  problem  is  alleviated  by  introducing  two-parameter  item  response 
models  given  by 

Pij  =  F(>yj9i  +  bj)  (1.1.2) 

or  equivalently  by 

Pv  =  FhAh  +  (bi  =  W)-  (1-1-3) 

The  jj  are  referred  to  as  "discrimination"  parameters  ranging  between  (0,  oo)  for  each 
j.  Models  of  this  type  are  referred  to  as  two-parameter  item  response  models,  since 
as  a  function  of  0j,  has  the  form  of  a  distribution  function  with  location  parameter 
— a.j  and  scale  parameter  l/7j. 

One  may  be  interested  in  inference  for  the  0j  or  for  the  ojj  or  simultaneously  for 
both  the  6i  and  the  a,  in  one-parameter  item  response  models.  Similarly  for  two- 
parameter  item  response  models,  inference  may  involve  one  or  more  of  the  three  sets 
of  parameters  {aj}  and  {7,-}.  Our  dissertation  will  primarily  concentrate  on 
inference  for  the  {aj}  in  one-parameter  models  and  for  {7,,  a,}  in  two-parameter 
models. 

We  shall  now  briefly  review  some  of  the  existing  estimation  procedures  for  one- 
and  two-parameter  item  response  models.  Classical  estimation  for  the  {a,}  or  {a,-,  7^} 
uses  maximum  likelihood  estimation  method.  A  major  difficulty  that  arises  in  this 
context  is  that  the  maximum  likelihood  estimators  (MLE)  of  the  item  parameters  are 
inconsistent.  This  is  an  example  of  the  well  known  Neyman-Scott  phenomenon,  first 
noted  by  Neyman  and  Scott  (1948).  They  proved  in  the  balanced  one-way  normal 
ANOVA  model  that  as  the  number  of  cell  means  grows  to  infinity,  the  MLE  of  the 
error  variance  is  inconsistent.  Andersen  (1970,  1972,  1973),  in  a  series  of  articles, 
has  discussed  very  extensively  frequentist  inference  for  the  Rasch  model.  One  of  his 
findings  is  that  when  k  =  2,  but  n  -»  00,  the  MLE  of  ax  (or  a2)  is  an  inconsistent 
estimator.  A  proof  of  the  inconsistency  of  the  MLE  for  general  k  is  recently  given  in 
Ghosh  (1995). 


4 


Andersen  (1970)  recommends  avoiding  the  problem  of  inconsistent  MLE's  for  the 
Rasch  model  by  finding  the  MLE's  of  the  aj  (j  =  1, . . . ,  k)  conditional  on  sufficient 
statistics  for  the  nuisance  parameters  9i(i  =  1, . . . ,  n).  This  approach,  however,  does 
not  work  when  separate  sufficient  statistics  do  not  exist  for  the  nuisance  parameters. 
This  is,  for  instance,  evidenced  in  the  probit  model,  pi3  =  +  aj),  <3>  being  the 
standard  normal  distribution  function,  where  sufficient  statistics  do  not  exist  for  the 
nuisance  parameters  . . .  ,#„).  In  fact,  the  only  item  response  model  that  admits 
nontrivial  sufficient  statistics  for  the  6{  is  the  Rasch  model.  Bock  and  Lieberman 
(1970)  have  advocated  instead  the  use  of  marginal  maximum  likelihood  estimates 
of  the  aj  (j  =  l,...,k)  by  assigning  some  distribution  to  d\,...,Qn.  Then  inte- 
grating out  with  respect  to  0\,...,9n,  one  finds  the  joint  marginal  distribution  of 
Xij(i  =  1, . . . ,  n;  j  =  1, . . . , k)  which  involve  only  a^, . . . ,  a*  and  the  parameters  of 
the  distributions  of  the  #j.  Based  on  this  distribution,  one  finds  the  MLE's  of  the 
<*!,..., a*;,  usually  referred  to  as  the  marginal  MLE's.  Similarly  for  two-parameter 
models  one  can  employ  marginal  maximum  likelihood  estimation  technique  for  the 
item  parameters  as  well  as  discrimination  parameters.  In  this  case  one  finds  the 
marginal  distribution  of  Xij  (i  —  1, . . .  ,n;  j  =  1, . . . ,  k)  involving  the  item  parame- 
ters ax,...,  afc  and  7i, . .  • ,  7jt-  Bock  and  Aitkin  (1981)  proposed  an  EM  algorithm  to 
obtain  marginal  maximum  likelihood  estimates  for  these  parameters  in  the  context 
of  two-parameter  models. 

An  alternative  and  attractive  estimation  procedure  for  item  response  models  is 
the  Bayesian  procedure.  Over  the  years,  this  approach  has  received  considerable  at- 
tention. However,  even  the  Bayesian  approach  admits  variation  in  actual  use.  First, 
in  a  regular  Bayesian  approach,  inference  is  based  on  a  completely  specified  prior 
distribution  for  all  the  unknown  parameters.  However,  unless  one  uses  fully  nonin- 
formative  priors,  such  procedures  will  lack  robustness  against  misspecified  priors.  As 
an  alternative,  one  may  proceed  by  estimating  some  or  all  of  the  prior  parameters 


5 


from  the  marginal  distributions  of  the  observations,  and  using  the  estimated  prior  to 
derive  the  estimated  posterior.  This  estimated  posterior  is  used  for  inferential  pur- 
poses. This  is  the  so-called  empirical  Bayes  (EB)  approach  which  has  gained  rapid 
popularity  over  the  past  decade. 

As  an  alternative  to  the  EB  method,  one  uses  a  hierarchical  Bayes  (HB)  pro- 
cedure which  assigns  a  prior  distribution  (often  vague)  to  the  unknown  parameters 
rather  than  estimating  them  from  the  marginal  distributions  of  the  observations.  The 
similarity  between  the  two  approaches  is  that  they  both  recognize  the  uncertainty  in 
estimating  the  prior  parameters.  Indeed,  an  EB  method  is  often  viewed  as  an  approx- 
imation to  a  HB  method.  The  two  approaches  usually  yield  similar  point  estimates  of 
the  parameters  of  interest.  But  the  advantage  of  the  HB  method  over  the  EB  method 
is  that  unlike  the  latter,  the  former  yields  more  reliable  estimates  of  the  variances  of 
the  parameters  of  interest  by  modeling  the  uncertainty  of  the  prior  parameters  in  the 
form  of  a  distribution  rather  than  obtaining  point  estimates  of  the  prior  parameters. 

We  now  briefly  summarize  the  existing  Bayesian  work  on  item  response  mod- 
els. It  is  usually  the  item  parameters  which  are  of  primary  interest  to  the  authors 
of  Bayesian  literature.  But,  sometimes  there  are  interests  on  the  subject  parame- 
ters as  well  besides  the  item  parameters.  Tsutakawa  and  Johnson  (1990)  employed 
Bayesian  methods  to  estimate  the  subject  ability  parameters.  They  considered  a 
three-parameter  logistic  model  (of  which  the  two-parameter  model  is  a  special  case) 
and  eventually  used  an  EB  approach  for  ability  estimation.  Birnbaum  (1969)  and 
Owen  (1975)  have  provided  Bayesian  estimation  of  ability  parameters  when  the  item 
parameters  are  known.  Owen  (1975)  assumed  a  normal  prior  with  a  zero  mean  and  a 
unit  standard  deviation  for  the  ability  parameters  whereas  Birnbaum  (1969)  assumed 
a  logistic  distribution  for  the  same.  Some  authors  have  used  Bayesian  procedures 
for  the  joint  estimation  of  both  the  ability  and  the  item  parameters.  Although  the 
Bayesian  estimation  of  ability  parameters  can  be  traced  back  to  the  mid-sixties,  the 


6 


procedures  for  joint  estimation  of  both  the  ability  and  the  item  parameters  are  more 
of  a  recent  nature.  Swaminathan  (1986),  Swaminathan  and  Gifford  (1981,  1982, 
1985)  have  provided  the  joint  Bayesian  estimation  of  both  the  ability  and  the  item 
parameters.  They  considered  a  two-parameter  logistic  model. 

As  mentioned  earlier,  our  interest  lies  in  Bayesian  estimation  of  item  parameters 
alone.  There  are  many  Bayesian  articles  available  on  the  estimation  of  item  param- 
eters that  treat  the  subject  ability  parameters  as  nuisance.  Albert  (1992)  developed 
a  regular  Bayesian  procedure  for  estimating  item  parameters  using  a  two-parameter 
logistic  model.  He  used  a  standard  normal  distribution  as  the  prior  for  the  subject 
ability  parameters  and  a  diffused  prior  for  the  item  parameters.  Kim  et  al.  (1994) 
used  a  HB  approach  for  item  parameter  estimation  in  the  context  of  two-parameter 
logistic  model.  Swaminathan  and  Gifford  (1982,  1985,  1986),  Mislevy  (1986)  and 
Hambleton  and  Swaminathan  (1985)  have  employed  HB  method  for  item  parameter 
estimation  for  two-parameter  models  with  the  logit  link.  These  authors  used  a  stan- 
dard normal  distribution  as  the  prior  for  the  ability  parameters,  a  first  stage  normal 
prior  for  the  item  difficulty  parameters  and  an  inverse  chi-distribution  for  the  item 
discrimination  parameters.  At  the  second  stage  they  eventually  used  diffused  priors 
for  the  item  difficulty  hyperparameters. 

1.2    Topic  of  this  Dissertation 

The  Bayesian  methods  that  have  been  used  so  far  in  the  literature  are  always 
link  specific.  The  authors  have  used  a  specific  link  function,  usually  either  logit  or 
probit  and  developed  their  estimation  procedure  for  that  link.  Our  primary  objective 
is  to  provide  a  unified  Bayesian  methodology  for  item  response  models  that  can  be 
implemented  with  any  of  the  three  links  considered,  namely,  logit,  probit  and  log-log. 
A  major  concern  in  choosing  priors  is  potential  impropriety  of  the  resulting  posteriors. 


7 


Throughout  this  dissertation  we  discuss  carefully  the  choice  of  priors  leading  to  proper 
posteriors. 

In  Chapter  2  we  discuss  classical  Bayesian  methodology  for  one-parameter  item 
response  models  and  provide  a  general  algorithm  to  implement  it  in  any  given  situa- 
tion. We  also  provide  conditions  under  which  marginal  ML  estimators  are  consistent. 
Also  in  this  chapter  we  devote  a  section  to  approximation  of  improper  priors  by  proper 
ones. 

In  Chapter  3  we  deal  with  hierarchical  Bayesian  methods  for  item  response  mod- 
els. We  develop  a  hierarchical  approach  for  one-parameter  models  and  discuss  the 
important  issues  of  propriety  of  posteriors  and  implementation. 

In  Chapter  4  we  discuss  classical  Bayesian  methodology  for  two-parameter  models 
and  develop  a  unified  theory  to  implement  them  in  a  given  context.  In  all  these 
chapters  we  illustrate  our  methodology  with  the  help  of  real  life  data  on  a  mathematics 
placement  test. 


CHAPTER  2 

A  UNIFIED  BAYESIAN  ANALYSIS  OF  ITEM  RESPONSE  MODELS  FOR 

BINARY  DATA 

2.1  Introduction 

As  discussed  in  the  introduction,  there  is  considerable  non-Bayesian  literature  for 
one-parameter  item  response  models.  Possibly,  the  biggest  attention  has  been  paid 
to  the  Rasch  model  which  has  applications  well  beyond  the  psychology  and  sociology 
literature.  The  probit  model  has  also  been  used  in  many  different  contexts. 

Much  of  the  Bayesian  literature  on  this  topic  is  devoted  either  to  partial  Bayes  or 
empirical  Bayes  estimation  procedure  where  some  prior  distribution  has  been  assigned 
to  the  subject  and/or  the  item  parameters,  and  these  parameters  are  either  simulta- 
neously estimated  by  their  posterior  modes,  or  the  item  parameters  are  estimated  by 
their  marginal  model  after  integrating  out  the  subject  parameters.  However,  to  this 
date,  a  complete  unified  Bayesian  analysis  seems  to  be  lacking  for  these  models.  The 
present  chapter  makes  an  attempt  in  this  direction. 

The  outline  of  the  remaining  sections  in  this  chapter  is  as  follows.  It  is  shown  that 
Laplace's  prior  necessarily  leads  to  a  nonidentifiable  improper  posterior.  A  necessary 
and  sufficient  condition  for  propriety  of  posteriors  with  nonidentifiable  likelihood  is 
provided.  Next  in  this  section,  we  consider  flat  priors  for  item  parameters,  and  proper 
priors  for  subject  parameters.  Sufficient  conditions  are  given  under  which  these  priors 
lead  to  proper  posteriors.  Implementation  of  the  Bayes  procedure  via  Markov  Chain 
Monte  Carlo  integration  technique  is  given  in  Section  2.3.  The  Bayesian  technique  is 

8 


9 


illustrated  with  a  placement  examination  dataset  in  Section  2.4  for  the  logit,  probit 
and  log-log  links.  Section  2.5  considers  a  specific  example  of  matched  pairs  data, 
and  compares  and  contrasts  the  Bayesian  technique  with  the  marginal  and  condi- 
tional likelihoods.  Section  2.6  discusses  uniform  approximation  of  improper  priors  by 
proper  priors.  This  extends  the  work  of  Mukhopadhyay  and  Dasgupta  (1992)  and 
Mukhopadhyay  and  Ghosh  (1994)  for  location-scale  models.  Finally,  Section  2.7  gives 
a  formal  proof  of  the  consistency  of  the  marginal  maximum  likelihood  estimators  for 
fairly  general  link  functions. 

2.2    Choice  Of  Priors 

For  the  one-parameter  item  response  model,  writing  0  =  {6\, . . . ,  9n),  a  =(ai, . . . , 
ak),  x  =  (xu, . . .  ,xik,. . .  ,£„i, . . .  ,xnk),  and  F  =  1  —  F,  the  likelihood  function  is 
given  by 

L(0,  a\x)  =  ft  fl  [F'HOi  +  OL^F^iOi  +  a,)]  ,  (2.2.1) 

i=l j=l 

We  now  consider  different  choices  of  priors  for  6  and  a.  One  straightforward 
approach  is  to  use  a  subjective  proper  prior  distribution  n(0,  at)  for  6  and  a,  and 
find  the  posterior  distribution  for  0  and  a  as 

tt(0,  a\x)  a  L(6,  a\x)ir(0,  a).  (2.2.2) 

However,  often  the  prior  information  is  vague,  in  which  case  it  is  useful  to  consider 
noninformative  priors. 

The  simplest  noninformative  prior,  due  to  Laplace,  is  n(9,a)  oc  1.  However,  this 
choice  of  priors  leads  to  improper  posteriors  for  0  and  a.  This  fact  is  a  consequence  of 
the  following  lemma.  This  lemma  is  possibly  known  to  many  Bayesians,  and  seems  to 
be  implicit  in  Dawid  (1979)  and  O'Hagan  (1994)  (p  72  and  p  158).  But  the  following 
explicit  formulation  seems  worthwhile  for  future  use. 


10 


Lemma  1. 

Suppose  the  parameter  vector  is  (fa,  fa)  where  either  or  both  of  fa  and  fa  may  be 
vector  valued.  Suppose  X  has  a  nonidentifiable  pdf  f(x\fa,fa)  —  f(x\fa).  Consider 
the  prior  n(fa,fa)  for  (fa,  fa).  Then  the  posterior  ir(fa,  fa\x)  is  proper  if  and  only 
if  Tt(fa\x)  and  Tr(fa\fa)  are  both  proper. 
Proof. 

n(fa,fa\x)   oc  f(x\fa,fa)ir(fa,fa) 
=  f(x\fa)n{fa)n(fa\fa) 
oc  7r(01|x)7r((/)2|^i). 

This  proves  the  lemma. 

Using  the  above  lemma,  it  is  possible  to  prove  the  following  theorem  showing  that 
Laplace's  prior  necessarily  leads  to  an  improper  posterior  in  the  present  context. 
Theorem  1. 

Consider  the  likelihood  function  given  in  (2.2.1),  and  the  prior  tt(0,  a)  oc  1.  Then 
the  posterior  n(0,a\x)  is  improper,  that  is 

/OO  roc 
•••/     7r(0,  oc\x)dQdct  —  oo.  (2.2.3) 
-OO        J  —  oo 

Proof. 

Make  the  one  to  one  linear  transformation 

rjt  =  di  +  ak  i-l,...,n, 

€j  =  <Xj-<*k  j  =  1,...,*  ~  1- 

Write  7]  =  (r}i,...,r)n),  £  =  (&, . . .  ,&_x).  Then  (0,a)  is  one  to  one  with  (17, 
and  the  likelihood  function  given  in  (2.2.1)  can  be  rewritten  as 

L(ri,tak)  =  nil  [Fx«{Th  +  Zi)F1-*«(Th  +  Zjj\  f[  [F^^F1-^^)}     (2.2 A) 

i=lj=l  t=l 


11 


Since  the  Jacobian  of  the  transformation  from  (0,  a)  to  (77,  £,0^)  is  a  constant  free 
of  any  parameter,  7t(tj,  £,  ak)  oc  1.  This  implies  7r(o!jt|?7,  £)  a  1  which  is  improper. 
Now  apply  Lemma  1  with  </>2  =  a*  and  0i  =  (77, £)  to  conclude  that  7r(r/, £, (Xk\x)  is 
improper.  Hence, 


Remark  1.  It  is  hinted  in  Swaminathan  and  Gifford  (1985,  p  353)  that  using  flat  pri- 
ors for  both  0  and  a,  the  Bayesian  analysis  will  be  equivalent  to  likelihood  based 
analysis.  But  the  resulting  posterior  distribution  being  improper,  it  will  make  very 
little  sense  to  talk  about  posterior  mean,  Bayesian  credible  sets,  posterior  quantiles 
and  other  posterior  quantities. 

Remark  2.  The  nonidentifiability  of  the  likelihood  becomes  apparent  with  the 
reparametrization  given  in  (2.2.4).  A  similar  nonidentifiability  occurs  for  more  com- 
plex item  response  models  as  well.  Also  the  result  as  stated  in  Lemma  1  should  be 
of  use  in  similar  other  contexts. 

Theorem  1  raises  the  question  whether  the  subset  (77,  £)  of  (r],€,ak)  has  a  proper 
posterior,  since  the  nonidentifiability  problem  then  disappears.  However,  the  answer 
continues  to  be  negative  when  either  for  at  least  one  i,  the  Xij  (j  =  1, . . . ,  k)  are  all 
zeros  or  l's,  or  for  at  least  one  j,  the  (i  =  1, . . . ,  n)  are  all  zeros  or  all  l's.  To  see 
this  consider  the  likelihood  as  given  in  (2.2.4)  as  L(t7,£).  Also,  let 


k 


•^ij      1  —  1  ?  •  •  •  1  ^1 


■r 


ij      j  —  1  >  •  ■  •  >  k . 


1=1 


Then,  we  prove  the  following  result. 
Theorem  2. 

Suppose  for  at  least  one  i,  tt  =  0  or  k.  Then  with  the  prior  7t(t7,£)  oc  1,  7t(t/,£|:e)  is 


12 


improper.  Also,  if  yj  =  0  or  n  for  at  least  one  j,  ir(r),£\x)  continues  to  be  improper 

under  the  same  prior. 

Proof. 

Suppose  tm  =  0,  that  is  xmi  =  •  •  •  =  xmk  =  0.  Now, 

-o  fc-i 


/oo  ~    x  rv  _ 

II  F(Vm  +  ^)F{r]m)drjm  >  /     J]  F(QF(0)dVm  =  +oc.  (2.2.5) 
-°°i=i  •/-°°j=i 

Similarly,  if  im  =  /c,  that  is  xml  =  •  •  •  =  xmfc  =  1, 

/OO  /"OO  1 

I]  F(r?m  +  ^F^drim  >  /    IJ  i^W)^  =  +oo.  (2.2.6) 
00  j=i  Jo  j=i 

Also,  if  y/  =  0,  that  is  xu  =  ■  •  •  =  xni  =  0,  one  gets 

/oo    n  /-O  n 

■°°i=i  •/-°°i=i 

>    flHru)  f  d&  =  +oo.  (2.2.7) 
»=i  J-°° 

Finally,  if  yt  —  n,  that  is  Xu  —  •  •  •  =  x,^  =  1, 

/OO      n  roo  n 

-°°t=i  -70  i=i 

II  F(ifc)  /    de,  =  +oo.  (2.2.8) 


n 

> 


The  theorem  follows  now  from  (2.2.5)  -  (2.2.8). 

It  appears,  however,  that  once  the  boundary  values  are  excluded  for  the  U  and 
the  yj,  the  posterior  7t(t7,£|:e)  is  proper.  This  result  is  proved  for  k  =  2  when  the 
link  function  is  logit,  probit  or  log-log.  It  is  conjectured  that  the  result  is  true  for  an 
arbitrary  k,  but  we  have  not  found  a  formal  proof  as  yet. 


13 


Theorem  3. 

Suppose  k  =  2,  7r(77,£)  oc  1,  and  0  <  yj  <  n  and  U  =  \  all  i  and  j.   Then  the 
posterior  7r(i7,£|a?)  is  proper  when  (i)  F(x)  —  exp(x)/[l  +  exp(x)],  (ii)  F(x)  — 
the  standard  normal  distribution  function,  or  (iii)  F(x)  —  exp(—  exp(— x)). 
Proof. 

First  consider  the  logit  link.  Since  U  —  1  for  alii  =  1, . . . ,  n,  the  posterior  is, 

-  nn=i  [{1  +  +  6)}{1  +  exp(r?i)}] •  12.2.9J 

We  want  to  show  that 

/oo  roc 
•••/    7r(f7,6|*)rf»7<^i  <  oo.  (2.2.10) 
-oo        J  —oo 

Substituting  z  =  exp(rji), 

r°°  expfa)  d 

J-oo  [{1  +  expfa  +  6)}{1  +  expfa)}]  m 

/•OO 

=  /    [{l  +  2exp(£,)}(l + 

=  6[exp(6)-l]-1-  (2.2.11) 
Then,  substituting  it  =  exp(£i), 

/oo 
exp(£iyi)tf[exp(&)  -  l]-»d& 
-oo 

/•oo 

=   /  (l0g?x)n(«-l)-nMI'1-1d« 
JO 

r'2  roo 

=  /  (\ogu)n{u-l)-nuyi-1du+  /    (logu)n(u-  l)-"^1"1^ 

jO  .72 


/•a  /-oo  1 


/-oo 

=  2»lj/1-1  +  2ny  (logu)nu-2du, 

/•OO 

=  2"1yr1  +  2njf     e-Vdt/<oo.  (2.2.12) 


14 


Next  consider  probit  link  with  k  =  2.  The  condition  of  the  theorem  says  all  the 
data  points  are  either  (1,0)  or  (0,1).  For  this  link  the  posterior  is 

»r(l,&|aO  =  f[^Xn(r]l  +  ^W-Xil(vt  +  ^Wl2(rllW~x'2(vl),  (2-2.13) 
where  $  =  1  —      We  want  to  show  that 

/oo  roc 
•••/    ir(ri, £i\x)dT)d£i  <oo. 
-oo        J —00 


Let  nio=number  of  subjects  for  which  Xn  =  l,x;2  =  0  and  n0i=number  of  subjects 
for  which  xn  =  0,  £;2  =  1.  Then, 

/oo  r  /-oo  _  ]"io  r  /-oo  _  iioi 

/    $(r/  +  6)*(i?)<b?       /  +  d&.  (2-2-14) 

-oo  LJ  — oo  J  L-/-00 

Now,  writing  </>(it)  =  $'(u),  and  integrating  by  parts, 

/oo 
-oo 

=  [$(r,  +  6)^)^00  -  /°°  V  [Hri  +  -  *(»7  +  6)^)1  di? 

./-oo     L  J 

/oo  /-oo  _ 

$fa  +  fi)#fo)<ty-  /     $(^#(77  +  6)^;  (2.2.15) 
-oo  J —oo 

Integrating  by  parts, 

/oo  /*oo 
-oo  J —oo 

/oo 
-oo 

/oo  1 
(27r-1exp(--[(7?  +  6)2  +  r?2])rfr7 
-oo  Z 

=    (27r1^)-1exp(-^2)-  (2.2.16) 


15 


Next,  integrating  by  parts,  and  using  the  standard  convolution  formula, 

/oo 
$(77)770(77 +  6)^77 
-00 

/oo    _  TOO  _ 

-00  J —00 

/oo  roo 
-00  j —00 

/oo 
<Kv)<Kri-ti)dn 
-00 

=  ZMZi/yfi)  +  (27T1/2)-1  «p(-ig).  (2.2.17) 

Hence, 

/oo  .  1 

$(77  +  6)^(^7)^  =  £i*(6/>/2)  +  *  '  exp(--ei2).  (2.2.18) 
-00  4 

Similar  calculations  give 

/oo  ,  1 

+  tiiMv)dr]  =  -ZMZ1/V2)  +  7T-2  exp(--£2).  (2.2.19) 
-00  4 

Hence, 

1  =  /-I  [*"iexp  (~\$)  +  ^i/v/2)[ 
7T-2  exp  (--£2J  -  6^(-6/\/2)j  dei 

,00  /  t2  \  "01 

<  yo  u-2exp(-^)j  [tt-3 +6r°^i 

+  j(     (7r-2exp(-^-)  I     [tt-3  +6]noi^i  <  00.  (2.2.20) 
Finally  consider  the  log-log  link.  Then,  the  posterior  is 


"10 

x 


16 


7r(»7,fi|aO  = 


JJ      exp(-  exp{-f]i  -  £i))(l  -  exp(-  exp(-%))) 

i:iii  =  l,ii2=0 


exp(-  exp(-»/i))(l  -  exp(-  exp{-r}i  -  &))) 


(2.2.21) 


We  want  to  show  that 


/oo  /-oo 
•••  /    7r(T7,fi|a:)dfjdfi  <  oo. 
-00        J —oo 

Now,  writing  exp(-r/i)  =  2  and  exp(-£i)  =  a,  and  integrating  by  parts 

/oo 
exp(-exp(-rfc  -  -  exp(-exp(-77i)))% 

-00 

=  /    exp(— zexp(— £i))(l  —  exp(— z))z  dz 
Jo 

roo 

=  /    logz[-aexp(— za)  +  (a  +  1)  exp(— z(a  +  l))]dz 
Jo 

roo 

=  a  /    exp(— za)flog(za)  —  logaldz 
Jo 

roo 

-(a  +  1)  /    exp(-z(a  +  l))[log(z(a  +  1))  -  log(a  +  l))dz 
Jo 

roo  roo 

=       exp(-y)\ogydy  -  log  a  /  exp(-y)dy 
Jo  Jo 

roo  roo 

-       exp(-y)\ogydy  +  \og{a  +  l)  exp(-y)dy 
Jo  Jo 

=  log(l+a-1)  =  log[l+exp(£i)]. 

Similarly, 


(2.2.22) 


/oo 
exp(-  exp(-7ft))(l  -  exp(-  exp(-?7t  -  6)))%  =  log[l  +  exp(-^)].  (2.2.23) 
-00 


17 


Thus, 

/oo 
[log(l  +  exp(6))]ni°  [log(l  +  exp(-6))]no1  d& 
-oo 

<  /    exp(-Gn0i)[log(l  +  exp(ei))]ni0d6 
Jo 

+  [°  exp(6n10)[log(l  +  exp(-6))]"01^i  <  oo.  (2.2.24) 

J — oo 

This  completes  the  proof. 

As  an  immediate  consequence  of  the  above  findings,  it  follows  that  the  joint  pos- 
terior of  (fi, . . . ,  £k-i)  =  ("1  -  a*;,  ■  ■  • ,  c*fc-i  -  &k)  given  x  may  be  improper  when 
t{  =  0  or  k  for  at  least  one  i,  or  yj  =  0  or  n  for  at  least  one  j.  This  is  in  contrast 
to  the  conclusions  of  the  usual  one  way  normal  ANOVA  model  considered  for  ex- 
ample, in  Sahu  and  Gelfand  (1995).  To  see  this  suppose  Y\, . . . ,  are  independent 
N(fi  +  ctj,  1),  (j  =  1, . . . ,  k).  If  one  assigns  Laplace's  prior  7r(/z,  ai, . . . ,  ajt)  oc  1,  then 
the  joint  posterior  of  (//, c*i, . . .  ,ctk)  is  improper,  once  again  as  an  immediate  conse- 
quence of  Lemma  1.  Yet  the  joint  posterior  of  (/i  +  au  . . . ,  \i  +  a^)  is  a  product  of 
independent  normals.  This  implies  that  the  joint  posterior  of  (a\  —  ak,  ■  ■  ■ ,  ojfc-i  —  a*) 
=  ((//  -I-  aj)  —  (//  +  a*), ...,(//-(-  afc_i)  —  (/i  +  afc))  is  also  proper.  This  difference 
seems  mainly  due  to  the  different  likelihood  and  link  structures  for  the  normal  and 
item  response  models.  Sahu  and  Gelfand  (1995)  provide  a  detailed  discussion  of  how 
estimability  of  parameters  can  be  linked  with  the  propriety  or  otherwise  of  posteri- 
ors for  the  normal  model.  A  similar  phenomenon  seems  unavailable  here  due  to  the 
inherent  nonlinear  structure  of  the  model. 

Clearly  from  Lemma  1  and  the  reparametrizations  introduced  in  (2.2.4),  one  gets 
a  necessary  and  sufficient  condition  for  the  posterior  of  (r],£,ak)  given  x  be  proper. 
This  is  equivalent  to  the  propriety  of  the  posterior  of  (6,  a)  given  x.  However, 
verification  of  the  conditions  of  Lemma  1  after  the  necessary  reparametrization  need 
not  always  be  easy.  We  find  it  more  convenient  to  verify  the  propriety  of  posteriors 


18 


of  (9,  a)  by  direct  calculations  for  the  logit,  probit  and  log-log  link.  Specifically 
consider  the  prior  n(0,  a)  oc  g(9)  when  g  is  a  proper  pdf.  The  following  theorem 
provides  specific  conditions  under  which  ir(9,ot\x)  is  proper. 
Theorem  4. 

Assume  the  conditions  (I)  g{9)  has  finite  moment  generating  function  (mgf)  and  (II) 
1  <  Vj <  n  —  1  for  all  j  =  1, . . . ,  k.  Then  for  the  noninformative  prior  on  a,  ir(9,  a\x) 
is  proper  for  the  link  functions  (i)-(iii)  of  Theorem  3. 
Proof. 

(i)  First  consider  the  logit  link.  In  this  case,  the  likelihood  function  is  given  by 


exp  (£r=1^i  +  E*=i«iy,-) 

nr=in-=1(i  +  exp(^  +  a,)) 


m a\x) = ::vr:       l-  ^) 


We  want  to  show 


/oo  roo  J}_  k_ 

•••/  L(9,a\x)g(9)l[dei]ldaj  <oo  (2.2.26) 
-oo        J —oo  •  _,  ■_, 


i=l  J=l 

provided  g{9)  has  finite  mgf.  First  we  get  the  inequality, 

exp(a^)  ^ 


roo  £ 

•/-oo  n?=i(l+  exp(0, +  <*))' 

r°  exp(ayj-)  /"°°  exp(ay,-) 

"V-oon?=i(l  +  exp(ft  +  a))  °  Jo    nr=i(l  +  exp(^  +  a)) 

/0  "  /-oo 

exp(ayj)da  +  exp(-  V*  #i)  /    exp(-(n  -  y,)a)da 

-oo  j  70 


<  j/,"1  +  exp(-  £  0,)(n  -  y,)"1  <  1  +  exp(-  £  9l).  (2.2.27) 


Hence, 


/oo  roo  71  n  n 

■■■       exp(]T  OiUftl  +  exp(-  J]  II  ^  <  °°»  (2-2-28) 

-°°     "/-°°       »=i  i  <t=l 


due  to  the  finiteness  of  the  mgf  of  g.  This  proves  (i). 
(ii)  In  this  case 

L(6,  a\x)  =  f[  ft  [*Xij(0i  +         -  m  +  a;))1-**]  •  (2-2-29) 
i=i j=i 

First,  using  the  inequality  |[exp(£)  +exp(-t)\  <  exp(£2/2)  for  all  real  t,  note  that  for 

Qj  >  0, 

rOO 

1  -  <t>(9i  +  aj)  =  /      (2tt)-1/2  exp(-t2/2)dt 

roo 

<(27r)-1/2/      2(e*  +  e-t)-1^ 
■/0,+a, 

<  (2/tt)1/2  /°°  e-'d* 

=  (2/tt)1/2  exp(-^  -  tkj).  (2.2.30) 

Similarly,  for  a3  <  0, 

$(0,  +  a,-)  <  (2/tt)1/2  [0,+aj  eldt 

J —oo 

<  (2/tt)1/2  exp(0i  +  aj).  (2.2.31) 

Hence, 

/oo  roo  Jj-  _k_ 

■■■  L(9,a\x)g(0)l[d6l1[[daJ 
"°°     J-°°  i=i  j=i 

/oo  roo  k  -o 

•  /    #)II[/    {(2/7r)1/2eXp(ft +  aj)}*«dai 
-oo        J —oo  j_j  J —oo 

roo 

+  /    {(2/7r)1/2exp(-^  -  a,-)}1-^^-]^!  •  •  -d0n 

•/  o 

/cx>  roo  *  »  /-0 

•  •  •  /    ^(0)  U[(2Myj/2^P(E^A)  /  exp(a%)da 
-oo        ./-oo  ■   ,  *— ;  J-oo 


20 


"  roc 

+(2/n)n-y>'2  exp(-  £(1  -  x^Oi)  /    exp(-(n  -  Vj)a)da}  ^ 
t=i  70  i=i 

/OO  /•OO  ^  n 

•••/  ^)(2/7r)fc/2n[exp(E^)2/71 

+  exp(-  £(1  -  xy)^)(n  -  y,)"1]  f[  dSi 


i=i  i=i 

roo  roo  ^ 


<(2/n)k'2       •••/  ^)n[exp(E^)+exp(-Bl-^)^)]II^ 

-7-00       j=i       i=i  i=i  i=i 

=  (2/tt)*/2  /     •  •  •  /    g(0)  exp(X>^)(l  +  exp(-][>,))*  II  * 

<  oo.  (2.2.32) 

due  to  the  finiteness  of  the  mgf  of  g{0).  This  proves  (ii). 
(iii)  In  this  case 

n  k 

L(0,  a\x)  =  exp(-  ]T  ^     exp(-^  -  a,)) 
i=i  j=i 

n  k 

x  II II  f1  -  exP  (-  exp(-ft  -  a,))]  •  (2.2.33) 
t=i j=i 

We  want  to  show  that  the  integral 

I  =  JRn  jRk  9(9)L(0,  cx\x)dotdO  <  oo.  (2.2.34) 


But 


7^  L»9(0)Ulf0  exp(-^xIJexp(-^-a,))dai 

Jit  j=1  J-co 

/•OO  71 

+  /    lit1  -  exP(-  exp(-^  -  a^Y^'da^dO.  (2.2.35) 

Jo  i=i 


21 


Now  using  exp(— u)  <u  1  for  u  >  0, 

rO  n 

/    exp(—  ^2  Xij  exp(—8i  —  aj))daj 

J —oo  i 

rO  n 

=  /     exp(-  ]T  xi:j  exp(-di)  exp(-aj))daj 

J-oo  ■ 
rO  n 

<  /  E^j ^exp(-^))~1exp(aj)do!J 
J-°°  i=i 

n  n 

=  C^x^expi-Oi))-1  <  [max^nexpde^y-1  <  ^exp(|^|).  (2.2.36) 

;=--l  i=l 

Also,  using  1  —  exp(— u)  <  u  for  all  u  >  0, 

rOO  71 

/  nt1  -  exp(-  exp(-#i  -  <*i))]i~xijd®j 

roo  n 

<  /  n^pM*-^)}1"*^ 
j°  i=i 

n  roo 

=  exP(-  £       ~  xv))  /    exp(-aj(n  -  yj))datj 

J  0 

=  exp(-YJ0l(\-xlJ))(n-y3)-x  <exp(^|^|).  (2.2.37) 


j=i 


Hence,  from  (2.2.35)  -  (2.2.37), 

/   <    JRn9(0)[2exp(£m)]kdO<cx>,  (2.2.38) 

due  to  the  finiteness  of  the  mgf  of  g(0).  This  proves  (iii),  and  completes  the  proof  of 
Theorem  4. 

Before  concluding  this  section,  we  establish  an  invariance  property  of  the  posterior 
means  of  the  elementary  contrasts  or,  -  am  (1  <  j  ^  m  <  k)  for  general  link  functions 
and  the  location-scale  family  of  priors  g^a{9)  =  a-ng{{9x  -  y)/a, . . . ,  (0n  -  n)/o)  for 


22 


0  and  independent  flat  prior  for  a.  The  following  theorem  shows  that  the  Bayes 
estimate  of  aj  —  am  is  location  invariant,  that  is  it  does  not  depend  on  the  choice 
of  fj,.  Thus  as  in  Section  4  of  this  paper,  if  one  is  interested  in  inference  about  the 
elementary  contrasts,  for  a  location-scale  prior,  /i  can  be  taken  as  0  without  loss  of 
generality. 
Theorem  5. 

Consider  the  likelihood  given  in  (2.2.1)  and  the  prior 

tt(0,  a)  oc  o-ng({9x  -  fi)/a, . . . ,  (9n  -  fi)/a).  (2.2.39) 

Then  the  posterior  -n^a(cx.\x)  of  a  equals  iro^^x),  where  7  =  (71,..., 7*)  and 

7^  =  fi  +  aj  (j  =  l,...,k). 

Proof. 

Let  Zi  =  Qi  -  n  (i  -  1, . . . ,  n),  z  =  {zu  . . . ,  zn).  Then 

/   I  \  fRn  L(0,ct\x)ir(0,ot)d0 

7r"^a|a:)  ~  fRkfRnL(0,a\x)7r(0,a)d0da 

I — 1  J  —  1 

+ 1 1  n  n     + °,)F'-'V, + 0,)}  3rs(^  ^)rf84» 

= L  n  n    (* + <* + 7j)}  g(j  j)dz 

*  L  L  %  t  (F'" {z' + i')t""" (z- + ■*>)  g&  ■  ■  ■ '  zi)dz^ 

1 —  1  j  —  1 
=  7T0,ff(7|x). 

Remark  3.  Since  aj  -  am  =  jj ■.  -  7m  (1  <  j  ^  m  <  k),  the  joint  posterior  of  these 
contrasts  does  not  depend  on  /i. 


23 


2.3    Implementation  Of  Bayes  Procedures 

Consider  the  general  link  pij  =  F(9i  +  aj),  i  =  1, . . . ,  n  ,  j  =  1, . . . ,  k,  and  the 
prior  7r(0,  a)  oc  []"=i  <?i(#i)  FljLi  92(&j)-  Both  gi  and  g2  can  be  improper  as  long  as  the 
posterior  w(6,  ct\x)  remains  proper.  However,  due  to  nonconjugacy  of  the  prior,  the 
posterior  is  analytically  intractable,  and  can  be  found  only  via  numerical  integration. 
Also,  direct  numerical  integration  seems  infeasible  in  this  case  because  of  the  high 
dimensionality  of  the  problem. 

Fortunately,  the  integration  task  has  become  easier  due  to  the  advent  of  the  so- 
phisticated Markov  Chain  Monte  Carlo  (MC2)  numerical  integration  techniques.  We 
shall  use,  in  particular,  Gibbs  sampling  for  integration  purposes.  Gibbs  sampling, 
first  introduced  by  Geman  and  Geman  (1984),  has  received  considerable  attention  in 
recent  years,  especially  in  Bayesian  analysis,  mainly  due  to  the  landmark  papers  of 
Gelfand  and  Smith  (1990)  and  Gelfand  et  al.  (1990).  The  method  is  described  below. 

Gibbs  sampling  is  a  Markovian  updating  scheme.  Given  an  arbitrary  starting  set  of 
values  Ui°\  . . . ,  U™,  we  draw  f/J1'  ~  [Ux  \U?\ . . . ,  Dff>],        ~  [U2\u[l\  U?\  •  •  • ,  £«>], 

U$  ~  [f/fjt/i1', . . . ,  t/^-iL  where[-|-]  denotes  the  relevant  conditional  distribu- 
tions. Hence,  each  variable  is  used  and  a  cycle  in  this  scheme  requires  generation 
of  m  random  variables.  After  t  such  iterations,  we  get  (Ui\ U$).  As  t  ->  oo, 
{Ui\  . . . ,  U$)  -4  (Ui, . . . ,  Um).  Gibbs  sampling  through  q  replications  of  the  above 

t  iterations  generates  q  iid  m-vectors  U^]),  j  =  1, . . .  ,q.  Ui,...,Um  could 

be  real  or  vector  valued. 

Gibbs  sampling  consists  of  finding  the  conditional  distribution  of  every  parameter 
given  the  remaining  parameters  and  the  data.  In  this  case  the  full  conditionals  are 


24 


given  by 

jt^I*,^  <).«.*)«  n[^(^  +  aj)^1_Ii>(^  +  ai)bi(^);  (2-3.1) 
7r(ai|am(m  7^  oc  f[[FXi'(6i  +  a^F1"^^  +  a^a,);  (2.3.2) 

i=l 

i  =  1, . . . ,  n  ;  j  =  1, . . . ,  k.  Note  that  the  full  conditional  of  ^  does  not  involve  the 
remaining  9i(l  ^  i),  and  the  full  conditional  of  otj  does  not  involve  the  remaining 

Oim{m  ^  j). 

Notice,  however,  that  the  full  conditionals  are  also  non-standard  densities  from 
which  it  is  not  possible  to  draw  samples  directly.  The  general  procedure  for  generating 
samples  in  such  cases  is  to  use  the  Metropolis-Hastings  accept-reject  algorithm.  If, 
however,  F,  F,  gt  and  g2  are  all  log-concave,  the  full  conditionals  7r(^|-)  and  7r(a!j|-) 
are  all  log-concave.  One  can  then  use  the  adaptive  rejection  sampling  of  Gilks  and 
Wild  (1992).  Adaptive  rejection  sampling  is  a  useful  technique  to  sample  from  log- 
concave  densities.  Due  to  its  adaptive  nature  it  reduces  the  number  of  evaluations  of 
the  function  from  which  the  sample  is  drawn.  This  is  especially  useful  in  situations 
where  the  evaluation  of  the  function  is  computationally  expensive. 

Adaptive  rejection  sampling  is  an  improvement  over  general  rejection  sampling 
in  the  sense  of  computational  time.  Rejection  sampling  is  a  general  method  for 
sampling  points  independently  from  a  density,  say,  f(x).  The  density  f(x)  need 
not  be  completely  specified,  i.e.,  rejection  sampling  may  be  performed  by  using  g(x) 
instead  of  f(x),  where  g(x)  -  cf(x)  for  some  possibly  unknown  value  of  c.  Rejection 
sampling  involves  calculating  an  envelope  function  of  g{x)  which  is  often  done  by 
locating  the  supremum  of  g(x)  by  using  a  standard  optimization  technique.  Adaptive 
rejection  sampling,  however,  avoids  the  need  to  locate  the  supremum  of  g(x)  by 
assuming  log-concavity  of  f(x).  This  reduces  computational  time  considerably.  It 


25 


also  cuts  the  time  down  by  updating  the  information  about  g(x)  contained  in  the 
previous  step  of  the  iteration. 

To  see  the  log-concavity  of  7r(0j|-)  and  7r(aj|-),  simply  write 

k 

logTr^l^/  /  *).  a> x)  =  ^gF(6t  +  aj)  +  (1  -  xtj)  logF^  +  a,-)]  +  logpi(^). 

(2.3.3) 

If  F,  F  and  g\  are  all  log-concave  then  clearly  7r(0j|0/(/  ^  i),a,«)  is  log-concave. 
Similarly,  if  F,  F  and  g2  are  log-concave,  7r(ojj|o;m(m  7^  j),6,x)  is  log-concave.  The 
log-concavity  of  F  and  F  is  ensured  if  F  is  an  IFR  df. 

It  should  be  noted  also  that  the  full  conditionals  log7r(#j|0;(/  7^  i),ot,x)  (i  = 
1, . . . ,  n)  and  log  it  (a j\am(m  ^  j),  0,  x)  (j  —  1, . . . ,  k)  can  all  be  proper,  and  yet  the 
posterior  n(0,ot\x)  can  be  improper  (cf.  Casella  and  George  (1992)).  To  see  this  for 
the  Rasch  model,  recall  that  the  link  function  is  F(z)  =  exp(z)/[l  +  exp(^)].  Now,  if 
one  assigns  the  flat  prior  n(0,  a)  oc  1,  then  from  the  general  result  proved  in  Theorem 
1,  w(6,a\x)  is  improper.  But 


"1  n*=i[l  +  exp(0<  +  aj] 


a   ^P^*')   f2  3  4) 

n^Jl  +  exp^  +  a,-)]"  '  '  ' 


Now  substituting  z  for  exp(0,-), 


/OO  *  /-OO  ^ 

exp(9tti)  Hi1  +  exP(^  +  aj)]_1^i  =  /  Tit1  +  zexp(<*j)]~1dz 

rl  *  /•oo 

<  /  z(,-1d2  +  exp(-^aJ)  /    z^^dz.  (2.3.5) 


26 


Now,  if  1  <  t{  <  k  —  1  for  alii  =  1, . . . ,  n, 


k 

right  hand  side  of  (2.3.5)  =  t~l  +  exp(—  ^  aj)(fc  —        <  oo. 


(2.3.6) 


i 


Similarly,  if  1  <  yj  <  n  —  1,  (j  =  1, . . . ,  k), 


dj  <  oo,     j  =  1, . . .  ,k. 


(2.3.7) 


2.4    Analysis  Based  on  Data  For  Placement  Tests 


The  data  considered  in  this  section  constitute  part  of  the  results  of  a  mathematics 
placement  test  for  entering  students  who  satisfactorily  complete  two  years  of  algebra 
and  one  year  of  geometry  in  secondary  school.  The  data  set  is  taken  from  Albert 
(1992).  There  are  200  students  each  answering  8  questions.  Each  question  is  classified 
as  "correct"  and  "incorrect",  and  is  coded  as  1  and  0,  respectively.  Our  interest  lies 
in  inference  about  the  difference  of  the  difficulty  values  of  these  questions,  or  more 
specifically  in  the  posterior  means  and  posterior  s.d.'s  of  aj  —  a8  (j  =  1, . . . ,  7).  Three 
different  links,  logit,  probit  and  log-log  are  considered.  We  consider  the  prior  ir(0,  a) 
oc  exp(-  Ylidf/{2a2)),  that  is  0  and  a  are  a  priori  independent,  0<  are  iid  N(0,a2), 
while  the  a,  have  flat  priors.  Due  to  the  discussion  of  Section  2,  /j,  can  be  taken  as 
zero  without  loss  of  generality.  For  the  Bayesian  analysis,  a  is  taken  as  the  tuning 
parameter  and  is  used  to  study  the  sensitivity  of  the  Bayes  procedure  with  respect  to 
the  choice  of  priors. 

The  posterior  means  and  s.d.'s  are  calculated  using  the  Gibbs  sampler  (Geman 
and  Geman,  1984,  Gelfand  and  Smith,  1990).  For  this  example,  the  number  of  iterates 
to  generate  a  sample  is  taken  as  50,  while  the  number  of  samples  is  taken  as  5000.  A 
burn-in  sample  of  5000  is  used  before  the  actual  sample  is  gathered.  The  convergence 
of  the  Gibbs  sampler  was  checked  using  the  Gelman-Rubin  (1992)  algorithm. 


27 


The  usual  competitors  of  the  Bayes  estimates  of  a  are  the  maximum  likelihood 
estimates  for  the  mixed  model  treating  0  as  a  random  effect.  As  mentioned  earlier 
in  the  introduction,  these  estimates  are  referred  to  as  marginal  maximum  likelihood 
estimates  (MMLE's),  since  they  are  calculated  by  integrating  out  the  likelihood  with 
respect  to  the  assumed  normal  distribution  of  the  random  effect  0  and  maximizing  the 
resulting  "marginal  likelihood"  of  x  with  respect  to  a  and  a2.  The  associated  stan- 
dard errors  for  the  aj  —  ak  are  based  on  the  asymptotic  distribution  of  the  MMLE's. 

Alternatively,  for  a  fixed  effects  formulation  with  the  logit  link,  it  is  possible  to  find 
sufficient  statistics  for  the  nuisance  parameters  6%  (i  =  1, . . . ,  n).  These  are  Ti, . . . ,  Tn, 
where  Ti  =  Y,j=i  Yij-  Hence,  the  conditional  distribution  of  X  given  T  =  (Ti, . . . ,  Tn) 
does  not  depend  on  the  nuisance  parameter  0.  The  resulting  likelihood,  referred  to  as 
the  conditional  likelihood,  depends  only  on  a.  It  is  shown  in  Andersen  (1970,  1973) 
that  the  MLE  of  a  based  on  this  conditional  likelihood  (henceforth,  referred  to  as 
CMLE)  is  consistent.  Hence,  for  the  logit  link,  we  have  considered  the  CMLE's  as 
well. 

The  natural  question  is  how  to  choose  a  for  the  prior  for  the  subject  ability 
parameters.  In  many  applications,  a  value  would  be  suggested  by  previous  work  with 
subjects  from  the  same  or  similar  populations.  Otherwise,  in  order  to  select  a  value, 
it  can  be  helpful  to  develop  a  feeling  for  how  the  size  of  a  relates  to  the  probabilities 
of  a  bright  and  not-so-bright  subject  getting  the  correct  answer.  The  following  table 
shows  these  probabilities  for  students  at  the  5th  and  95th  percentiles  of  the  normal 
prior  distribution  of  ability,  for  a  variety  of  choices  of  a  and  the  item  parameter  a, 
for  the  logit  link. 


28 


Table  2.1  Probabilities  of  Correct  Response  at  (5th,  95th)  Percentiles  of  Ability 


a  =  0 

a  =  1 

a  =  2 

a  — 

2.0 

(.036,  .964) 

(.092,  .986) 

(.216,  .995) 

a  = 

1.5 

(.078,  .922) 

(.187,  .870) 

(.385,  .989) 

a  = 

1.0 

(.162,  .838) 

(.344,  .934) 

(.588,  .975) 

a  = 

0.5 

(.305,  .695) 

(.544,  .861) 

(.765,  .944) 

a 

.25 

(.399,  .601) 

(.643,  .809) 

(.830,  .918) 

o  = 

0.0 

(.500,  .500) 

(.731,  .731) 

(.881,  .881) 

This  table  suggests  in  most  applications,  a  much  above  1  would  be  quite  unusual, 
the  differences  in  probabilities  being  severe.  Lacking  other  information,  one  might 
select  a  =  1.0  for  a  population  expected  to  be  moderately  heterogeneous  and  a  —  0.5 
for  a  population  expected  to  be  mildly  heterogeneous.  Notice  that  a  =  0  corresponds 
to  a  homogeneous  population. 

These  values  apply  to  the  logit  link.  The  standard  deviation  for  the  cdf  of  the 
logit  link  equals  n/y/3  times  that  for  probit  link  and  y/2  times  that  for  the  log-log 
link,  which  suggests  corresponding  standard  deviation  values  .551  and  .276  for  the 
probit  link  and  .707  and  .354  for  log-log  link. 

Table  2.2  displays  the  Bayes  estimates  of  the  item  effects,  with  the  associated 
standard  errors  in  parentheses,  for  the  logit  link  using  a  =  .25,  .50,  1.0,  and  1.5. 
That  table  also  displays  the  conditional  MLEs.  Tables  2.3  and  2.4  show  the  Bayes 
estimates  for  the  probit  and  log-log  links,  using  comparable  values  of  a  for  those  scales. 
All  three  tables  also  provide  the  marginal  MLEs,  as  well  as  the  Bayes  estimates  for 
the  a  values  obtained  via  the  marginal  ML  approach  (these  are  1.05,  .60,  and  .85  for 
the  three  links).  The  marginal  MLEs  are  also  the  posterior  modes  for  the  Bayesian 
approach  which  uses  a  flat  prior  for  a.  So  not  surprisingly,  some  of  the  marginal  MLEs 
are  very  close  to  the  Bayes  estimates,  especially  when  the  corresponding  a-values  are 
close. 


29 


In  the  limiting  case  a  =  0,  observations  for  each  item  are  independent  Bernoulli 
with  common  success  probability,  and  a  comparison  of  items  corresponds  to  a  compar- 
ison of  independent  binomial  parameters.  The  item  effects  then  refer  to  "population- 
avaraged"  effects.  As  a  increases,  the  correlation  between  the  responses  on  the  various 
items  increases.  The  item  effects  are  then  "subject-specific".  Subject-specific  effects 
are  typically  larger  than  population-avaraged  effects,  the  difference  increasing  as  the 
correlation  increases  (Neuhaus,  Kalbfleisch,  and  Hauck,  1991).  Tables  2.2-2.4  show 
this  pattern,  the  Bayes  estimates  tending  to  increase  as  a  increases. 

Table  2.2  CML,  MML  and  Bayes  estimates  for  logit  link 


i 

cmle/mmle 

a?  ~  of 

a  =  0 

a  =  .25 

a  =  .50 

a  =  1.0 

a  -  1.5 

1 

1.991/2.036 

1.677 

1.709 

1.784 

2.012 

2.235 

(0.249/0.266) 

(0.223) 

(0.226) 

(0.230) 

(0.243) 

(0.260) 

2 

3.388/3.467 

2.892 

2.938 

3.063 

3.440 

3.787 

(0.288/0.299) 

(0.254) 

(0.264) 

(0.265) 

(0.282) 

(0.298) 

3 

2.869/2.934 

2.429 

2.466 

2.578 

2.911 

3.22 

(0.268/0.282) 

(0.238) 

(0.240) 

(0.245) 

(0.263) 

(0.281) 

4 

2.969/3.037 

2.518 

2.561 

2.671 

3.011 

3.325 

(0.271/0.285) 

(0.241) 

(0.243) 

(0.246) 

(0.265) 

(0.277) 

5 

-0.129/-0.067 

-0.058 

-0.055 

-0.057 

-0.075 

-0.072 

(0.255/0.282) 

(0.237) 

(0.245) 

(0.244) 

(0.261) 

(0.271) 

6 

1.488/1.534 

1.264 

1.289 

1.347 

1.520 

1.687 

(0.245/0.212) 

(0.217) 

(0.222) 

(0.224) 

(0.239) 

(0.257) 

7 

1.463/1.509 

1.244 

1.272 

1.321 

1.494 

1.664 

(0.244/0.264) 

(0.215) 

(0.223) 

(0.224) 

(0.240) 

(0.258) 

Table  2.3  MML  and  Bayes  estimates  for  probit  link 


i 

MMLE 

af  -  of 

a  =  0 

a  =  .138 

a  =  .276 

cr  =  .551 

a  =  .827 

1 

1.170 

1.030 

1.049 

1.087 

1.201 

1.297 

(0.157) 

(0.133) 

(0.132) 

(0.134) 

(0.146) 

(0.149) 

2 

1.991 

1.743 

1.772 

1.834 

2.011 

2.167 

(0.173) 

(0.144) 

(0.146) 

(0.148) 

(0.157) 

(0.166) 

3 

1.699 

1.484 

1.506 

1.565 

1.727 

1.860 

(0.163) 

(0.138) 

(0.140) 

(0.143) 

(0.151) 

(0.158) 

4 

1.737 

1.535 

1.555 

1.608 

1.766 

1.901 

(0.154) 

(0.138) 

(0.140) 

(0.143) 

(0.151) 

(0.158) 

5 

-0.065 

-0.033 

-0.028 

-0.031 

-0.023 

-0.020 

(0.159) 

(0.137) 

(0.138) 

(0.143) 

(0.150) 

(0.155) 

6 

0.875 

0.775 

0.789 

0.817 

0.907 

0.982 

(0.158) 

(0.132) 

(0.131) 

(0.134) 

(0.141) 

(0.145) 

7 

0.855 

0.761 

0.773 

0.805 

0.888 

0.960 

(0.150) 

(0.131) 

(0.133) 

(0.133) 

(0.142) 

(0.147) 

Table  2.4  MML  and  Bayes  estimates  for  log-log  link 


i 

MMLE 

of  -  of 

a  =  0 

a  =  .177 

a  =  .354 

a  =  .707 

a  =  1.061 

1 

1.374 

1.114 

1.140 

1.202 

1.371 

1.518 

(0.163) 

(0.143) 

(0.147) 

(0.149) 

(0.160) 

(0.169) 

2 

2.508 

2.167 

2.202 

2.290 

2.508 

2.705 

(0.215) 

(0.202) 

(0.201) 

(0.207) 

(0.211) 

(0.221) 

3 

2.059 

1.747 

1.777 

1.854 

2.057 

2.236 

(0.190) 

(0.175) 

(0.174) 

(0.177) 

(0.185) 

(0.193) 

4 

2.180 

1.828 

1.865 

1.945 

2.179 

2.382 

(0.195) 

(0.178) 

(0.179) 

(0.180) 

(0.189) 

(0.201) 

5 

0.015 

-0.030 

-0.023 

-0.011 

0.014 

0.037 

(0.147) 

(0.124) 

(0.127) 

(0.132) 

(0.147) 

(0.157) 

6 

1.009 

0.800 

0.818 

0.867 

1.003 

1.122 

(0.155) 

(0.136) 

(0.138) 

(0.142) 

(0.151) 

(0.161) 

7 

1.038 

0.785 

0.810 

0.869 

1.037 

1.170 

(0.155) 

(0.135) 

(0.136) 

(0.138) 

(0.151) 

(0.159) 

31 


2.5    Example  on  Matched  Pairs  Data 

In  this  section  we  consider  the  Bayesian  analysis  of  another  type  of  dependent 
observations,  known  as  matched-pairs  data.  Dependent  observations  can  occur  in 
many  situations.  Matched-pairs  design  is  one  of  them.  Suppose  we  have  n  pairs  of 
individuals,  the  pairing  is  done  in  such  a  way  that  the  two  individuals  in  any  group 
tend  to  have  similar  characteristics  under  consideration.  For  instance,  subjects  in  a 
pair  can  be  matched  according  to  their  relationship  to  each  other,  such  as  husband- 
wife  or  father-son.  One  member  of  each  pair  belongs  to  group  1  and  the  other  to 
group  2.  The  data  from  matched-pair  cases  are  usually  summarized  by  a  2  x  2  table. 
As  introduced  before  in  the  previous  section,  the  response  variable  is  denoted  by  Xij 
and  it  takes  0  for  failure  and  1  for  success  (i  =  1, . . . ,  n;  j  =  1, 2).  A  typical  table 
for  these  data  looks  like  the  following. 

Table  2.5  A  Typical  2x2  Table  of  Matched-pairs  Data 


Group  2 

Group  1 

1  0 

1 

0 

nn  nx2 

"21  "22 

where,  nu  =  number  of  subjects  having  response  1  for  both  the  groups; 

n12  =  number  of  subjects  having  response  1  for  group  1  and  0  for  group  2; 

n2i  =  number  of  subjects  having  response  0  for  group  1  and  1  for  group  2; 

n22  =  number  of  subjects  having  response  0  for  both  the  groups; 

A  matched-pairs  design  is  one  of  the  most  popular  tools  for  case-control  studies  (see, 

for  example,  Liang  and  Zeger  (1988)).  We  can  think  of  group  1  as  "case"  and  group  2 

as  "control"  in  the  above  table.  Matched-pairs  designs  are  widely  used  in  fields  such 

as  opthalmology,  where  an  individual  serves  as  his/her  own  control. 

Normally  for  matched-pairs  analysis,  conditional  maximum  likelihood  estimation 
is  the  most  familiar  and  widely  used  technique.  But,  there  are  some  situations  where 


32 


this  technique  is  not  tenable,  for  example,  cases  where  the  conditioning  variable 
(such  as  a  sufficient  statistic)  is  not  well  defined.  In  such  cases  one  usually  uses  the 
marginal  likelihood  estimation  procedure  based  on  a  mixed  model.  A  competitor  for 
marginal  maximum  likelihood  estimation  technique  is  the  Bayesian  procedure.  In 
this  section  we  will  discuss  three  types  of  estimation  techniques  for  matched-pairs 
cases,  namely,  conditional  maximum  likelihood,  marginal  maximum  likelihood  and 
Bayesian  estimation.  We  will  compare  and  contrast  the  estimates  obtained  from  these 
procedures.  In  estimation  for  matched-pairs  cases,  the  conditional  and  the  marginal 
maximum  likelihood  estimates  do  not  depend  on  the  main  diagonal  counts,  but  the 
Bayes  estimates  may  depend  on  these  counts.  In  the  next  subsection,  we  attempt  to 
show  this  dependence  of  Bayes  estimates  on  the  main  diagonal  counts. 

We  now  illustrate  the  method  of  Sections  2.2  and  2.3  with  the  help  of  an  example. 
The  data  is  taken  from  the  book  'Analysis  of  Binary  Data'  by  D.R.  Cox  and  E.J.  Snell 
(1989).  The  dataset  consists  of  23  matched  pairs  of  depressed  patients,  one  member 
of  each  pair  being  classified  as  'depersonalized'  and  the  other  as  'not  depersonalized'. 
After  treatment  each  patient  is  classified  as  'recovered',  coded  as  1,  or  'not  recovered', 
coded  as  0.  The  data  follow. 


Table  2.6  Frequency  Counts  of  Mental  Patients 


Response 

Response 

Not  cured  (0) 

Cured  (1) 

Not  cured  (0) 

2 

5 

Cured  (1) 

2 

14 

We  will  be  considering  three  link  functions,  namely,  logit,  probit  and  log-log  for  this 
example.  First  consider  the  following  model. 

logit  fa)  =  9i  +  aij,     i  =  l,...,n;     j  =  l,2.  (2.5.1) 


33 


Note  that  the  above  model  is  over-parametrized.  For  conditional  maximum  likelihood 
estimation  we  need  a  contraint  and  we  use  a2  =  0  for  this  purpose.  For  this  data  the 
Conditional  Maximum  Likelihood  Estimate  (CMLE)  of  cti  and  hence  the  difference, 
a i -a2  is  estimated  to  be  log(^)  =  log(|)  =  -0.916  (  Cox,  1958  )  and  the  asymptotic 
standard  error  is  0.837.  To  find  the  Marginal  Maximum  Likelihood  Estimate  (MMLE) 
we  assume  that  the  subject  parameters,  0j's,  are  iid  N(0,a2).  Under  this  setup  the 
MMLE  of  interest,  a\  —  a2,  is  obtained  using  software  called  Mixor  (Hedeker  & 
Gibbons,  1994)  and  is  found  to  be  -0.916.  The  estimate  of  the  prior  parameter  a 
is  1.258.  The  asymptotic  standard  error  of  the  MMLE  is  estimated  to  be  0.837. 
For  binary  matched  pairs  case  there  is  a  close  connection  between  conditional  and 
marginal  ML  estimation.  In  fact,  Neuhaus  et  al.  (1994)  showed  that  under  the 
condition  for  consistent  estimation  of  the  item  parameters,  the  marginal  ML  estimates 
and  the  conditional  ML  estimates  are  identical.  To  find  the  Bayes  estimate  of  the 
item  parameters  ot\  and  a2,  we  consider  the  model  described  in  (2.5.1)  with  a  normal 
prior  for  the  subject  parameters  and  a  flat  prior  for  the  item  difficulty  parameters. 
To  avoid  over-parametrization  we  take  the  mean  of  the  normal  prior  to  be  zero  (see, 
e.g;  Albert  (1992),  Tsutakawa  (1984)).  The  complete  prior  specification  is  given  in 
the  following. 

9i  ~  N(0,  a2) 

ctj  ~  Uniform(— oo,  oo) 

We  find  the  Bayes  estimates  of  a5  denoted  by  af,  j  =  1,2  for  different  values  of 
a2.  The  result  is  summarized  in  Table  2.7.  For  a2  =  1,  af  -  af  =  -0.891  and 
the  absolute  value  of  af  -  af  increases  as  a2  increases.  This  phenomenon  is  not 
surprising  ,  because  as  a2  goes  to  oc  the  posterior  of  a  given  the  data  approaches 
the  likelihood,  causing  af  -  af  to  coincide  with  the  ordinary  ML  estimate,  which 


34 

is  -1.833  with  a  standard  error  of  1.183.  More  specifically  the  joint  posterior  density 
of  a  and  0  given  the  data  x,  say  f(a,0\x),  approaches  the  ordinary  likelihood  as  a 
tends  to  oo.  In  other  words, 

e£JW«i+Ei^.-£i('.?/(2<r2)) 


f{a,0\x)  oc 


TUMI + 

the  ordinary  likelihood  as  a  — >  oo. 

It  is  easy  to  see  that  the  result  is  true  for  any  link  function.  This  is  shown  in  the 
following  calculation. 

f(a,  0\x)    oc    J]  fl  F*'3  (°i  +  Oij)F1-^  {6i  +  aj)e-  EiWf/^M 

i=lj=l 

i=i j=i 

the  ordinary  likelihood  as  a  — >  oo. 

At  this  stage  it  is  worth  looking  at  the  other  extreme,  i.e.,  a  approaching  zero. 
Note  that  when  a  =  0,  there  is  no  correlation  between  the  repeated  responses.  In 
other  words,  we  have  two  independent  binomials  for  each  margin.  As  a  result  of  this, 
the  model  for  logit  link  becomes, 

logit(py)  =  atj,     i  =  l,...,n;  j  =  l,2 

Hence,  the  Bayes  modal  estimates  of  ax  and  a2  are  simply, 

log  and  log  (    2/2  ^ 


So, 


«1  ~  «2     =     log  (      Vl       )  -  log  ' 


35 


log 
log 


'yi{n  ~  V2)\ 

'(n2l  +  n22)(nu  +  n21)' 
(n12  4-n22)(nii  +  nl2) 


=  log 

\n+2nl+J 
=   log(marginal  odds  ratio) 

For  <T  =  0  the  general  model  for  any  link  function  becomes, 

F(pij)  =  aj,    »  =  l,...,n;  i  =  1,2.  (2.5.2) 

Under  this  general  model,  the  log  likelihood  function  is  given  by 

logL(a;x)  =  $»gF(a,-)  +  $>  -  yj)\ogF(aj)  (2.5.3) 

3  3 

Hence,  the  Bayes  modal  estimate  of  ctj  is  simply, 

d=F-i(Vl)  (2.5.4) 
n 

Therefore,  the  Bayes  modal  estimates  of  ot\  —  a2  is  given  by 

«i  -  c?2  =  )  -  F_1(— ) 

n  n 

Now,  for  the  logit  link,  under  the  model  described  in  (2.5.2)  the  Bayes  estimate 
(the  posterior  mean)  of  ax  -  a2  is  found  to  be  -0.796  with  a  standard  error  of  0.743. 
The  Bayes  estimate  under  the  original  model  should,  therefore,  go  towards  this  value. 
In  actual  calculation  we  observe  this  pattern  as  we  make  a  smaller  and  smaller  (Table 
2.7). 


Table  2.7  af,  af  —  aB  and  associated  Standard  Errors 


a 

^,5 
«1 

a2 

'  *  x  

otf  -  olb 

-  a%) 

f.UYXlU 

U.o7U 

1.000 

-0.794 

0.74o 

U.zz4 

U.o93 

1.7UU 

a  one 

-U.oUo 

(J. 752 

U.olb 

1.735 

-U.oU3 

0.757 

v.ivi 

1.1  »  1 

-U.o4y 

U.  r 

1 

1  A~\  A 

1.414 

z.oUo 

-U.oyl 

n  qai 

1  99^ 

1  filfi 

±  .UdU 

^ .  o\jo 

-0  Q9Q 

U.jOO 

1.414 

1.815 

2.787 

-.972 

0.910 

1.581 

2.014 

2.997 

-0.983 

0.935 

2.236 

2.681 

3.791 

-1.111 

1.056 

2.739 

3.175 

4.373 

-1.198 

1.299 

3.162 

3.614 

4.893 

-1.280 

1.243 

3.873 

4.288 

5.689 

-1.401 

1.356 

5 

2.731 

4.541 

-1.811 

1.487 

Next  we  consider  the  probit  link.  For  this  link  the  model  becomes, 

ptJ .  =  $(6>i  +  a j),    i  =  l,...,n;    j  =  l,2. 

First  note  that  for  this  model  there  do  not  exist  conditional  ML  estimates  for  the 
item  parameters.  This  is  because  sufficient  statistics  for  subject  parameters  do  not 
exist  for  the  above  model.  So  for  this  model  we  will  consider  only  marginal  ML  and 
Bayes  estimates.  The  marginal  ML  estimate  of  on  —  a2  is  found  to  be,  using  Mixor 
(Hedeker  &  Gibbons,  1994),  di  -  d2  =  -0.528  with  standard  error  0.476.  We  take, 
as  before,  ^'s  to  be  iid  N(0,a2).  The  estimate  of  a2  from  the  marginal  likelihood  is 
0.532.  To  find  the  Bayes  estimates  of  ax  and  a2  we  take  the  subject  parameters  0j, 
i  =  1, . . . ,  n  as  iid  normal  with  0  mean  and  some  variance  a2.  As  before  we  take  a  flat 
prior  for  the  item  difficulty  parameters.  The  final  prior  specification  is  the  following 

9tl%N{Q,o2) 

oij  ~  Uniform(— oo,  oo) 


37 


Let  a*,  be  the  Bayes  estimate  of  a,,  j  =  1, 2.  We  calculate  these  Bayes  estimates  for 
different  values  of  a.  The  result  is  summarized  in  Table  2.8.  For  a  =  1,  apX  —  &p2  = 
-0.605  and  when  o  —  0.729,  the  estimate  of  a  from  the  marginal  likelihood,  the  Bayes 
estimate  ctpX  —6tp2  =  —0.532,  a  value  which  is  very  close  to  the  MML  estimate.  From 
the  table  it  is  seen  immediately  that  the  estimate  6tpX  —  <Xp2  decreases  as  a  increases. 

The  unconditional  maximum  likelihood  estimate  of  ax  —  a2,  for  probit  link,  is 
found  to  be  -1.132  with  a  standard  error  of  0.710.  As  mentioned  earlier  that  as  o 
approaches  infinity,  the  posterior  goes  toward  the  ordinary  likelihood.  So,  the  Bayes 
estimate  of  a.\  —  a2  should  approach  the  ordinary  ML  estimate  -1.132  as  a  gets  larger 
and  larger.  We  see  this  pattern  in  the  following  table  (Table  2.8).  Also,  when  a  =  0 
the  model  becomes  the  one  specified  in  (2.5.2)  with  F  replaced  by  The  Bayes 
estimate  of  ct\  -  a2,  under  this  situation,  is  -0.426  with  a  standard  error  of  0.421.  We 
claim  that  the  Bayes  estimate  of  ai  -  a2  under  the  original  model  should  get  closer 
to  this  value  if  we  make  a  smaller  and  smaller.  We  certainly  observe  this  trend  in  the 
following  table  (Table  2.8). 

Table  2.8  dp  ,         —  dJL  and  associated  Standard  Errors 


a 

aPi 

dpj  6ip2 

SE(d&  -  a^) 

7.07xl0-4 

0.529 

0.964 

-0.435 

0.408 

1 

0.717 

1.322 

-0.605 

0.486 

1.581 

0.906 

1.612 

-0.706 

0.521 

2.236 

1.056 

1.788 

-0.731 

0.538 

3.162 

1.155 

1.901 

-0.745 

0.531 

3.873 

1.163 

1.906 

-0.742 

0.543 

4.472 

1.140 

1.886 

-0.747 

0.542 

5.477 

1.212 

1.987 

-0.776 

0.555 

7.071 

1.687 

2.777 

-1.090 

0.667 

38 


Table  2.9  d£  ,   dfj  —  af2  and  associated  Standard  Errors 


u 

aLl 

°Ll  aL2 

SCj\aLl  aL2J 

n  nn7i 

U.UU  1  1 

1  084 

1  778 

u.uoo 

0.071 

1.086 

1.791 

-0.705 

0.672 

1 

1.339 

2.131 

-0.792 

0.695 

1.581 

1.566 

2.515 

-0.949 

0.724 

2.236 

1.915 

3.034 

-1.119 

0.791 

2.739 

2.150 

3.365 

-1.215 

0.827 

3.162 

2.416 

3.774 

-1.358 

0.878 

3.873 

2.696 

4.197 

-1.501 

0.932 

4.472 

3.035 

4.603 

-1.568 

0.965 

Next  we  consider  the  log-log  link  for  the  same  example.  We  follow  the  same  steps 
described  in  the  case  of  logit  link.  To  find  the  marginal  ML  estimate  we  take  #'s  to  be 
iV(0,  a2).  The  marginal  ML  estimate  of  the  difference  (di  —  o22)  is  found  to  be  -0.748 
with  a  standard  error  of  0.674.  The  marginal  ML  estimate  of  the  prior  parameter 
a  is  found  to  be  1.011.  To  calculate  Bayes  estimates  we  take  the  same  prior  for  the 
subject  ability  parameter,  namely  N(0,  a2)  and  a  flat  prior  for  the  item  parameter. 
More  specifically, 

9i  ~  N(0,  a2) 

cij  ~  Uniform  (— oo,  oo) 

Let  dfj  denote  the  Bayes  estimate  of  ctj,  j  =  1,2.  The  result  is  summarized  in  Table 
2.9.  For  a  =  1  dfj  -  df2  =  -0.792  and  it  is  decreasing  for  increasing  a. 

For  log-log  link,  the  unconditional  maximum  likelihood  estimate  of  ot\  —  a2  is 
found  to  be  -1.581  with  a  standard  error  of  0.881.  As  mentioned  earlier  that  as  a 
approaches  infinity,  the  posterior  goes  toward  the  ordinary  likelihood.  So,  the  Bayes 
estimate  of  ai  -  a2  should  get  closer  to  the  ordinary  ML  estimate  of  -1.581  for  a  large 
value  of  a.  We  see  this  pattern  in  Table  2.9.  Also,  when  a  =  0  the  model  becomes  the 


39 


one  specified  in  (2.5.2)  with  F(x)  =  exp(—  exp(— x)).  The  Bayes  estimate  of  ot\  —  «2i 
under  this  model,  is  -0.688  with  a  standard  error  of  0.656.  We  claim  that  the  Bayes 
estimate  of  ot\  -  a2  under  the  original  model  should  get  closer  to  this  value  if  we  make 
a  smaller  and  smaller.  We  definitely  observe  this  pattern  in  the  above  table  (Table 
2.9). 

In  this  example  we  have  seen  some  definite  pattern  in  the  Bayes  estimates  for  the 
three  link  functions.  For  all  three  links,  logit,  probit  and  log-log,  there  is  a  general 
tendency  of  the  Bayes  estimates  to  decrease  if  we  increase  cr,  the  prior  standard  devi- 
ation for  the  subject  parameters.  This  empirical  observation  on  the  Bayes  estimates 
depicts  a  pattern  which  we  conjectured  to  be  true  mathematically.  But,  we  could 
not  find  a  formal  proof  of  this  empirical  fact  as  yet.  In  the  next  subsection  we  at- 
tempt to  find  a  relationship  of  the  Bayes  estimates  with  the  main  diagonal  counts  of 
matched-pairs  tables. 

2.5.1    Effect  of  Diagonal  Elements 

The  conditional  ML  estimates  do  not  take  into  account  the  effect  of  main  diagonal 
elements  of  matched  pairs  tables.  They  remain  the  same  even  if  we  change  the  main 
diagonal  counts  keeping  the  off  diagonals  the  same.  In  this  section  we  investigate  the 
effect,  if  any,  of  main  diagonal  counts  of  matched-pairs  tables  on  the  Bayes  estimates. 
For  this  purpose  we  artificially  created  25  tables.  The  data  are  created  in  such  a 
manner  that  there  is  an  increasing  trend  in  the  counts  along  the  main  diagonal  of 
these  tables.  The  following  table  provides  the  datasets. 


40 


Table  2.10  Artificially  created  tables  of  Matched-pairs  Data. 


Main  Diagonal  Counts 

Off  Diagonal  Counts 

Table  No 

"•11 

71  nn 

"•22 

71  i  n 

"•12 

Tint 

"•21 

1 

8 

O 

on 

OO 

1  0 

XL/ 

2 

» 

o 

20 

36 

OO 

25 

3 

8 

o 

90 

47 

48 

4 

8 

20 

49 

75 

5 

8 

20 

55 

195 

6 

X  O 

34 

35 

10 

X  \J 

7 

15 

34 

36 

25 

8 

15 

J.  O 

34 

47 

48 

9 

15 

X  O 

34 

OT 

49 

75 

10 

15 

X  O 

34 

55 

195 

11 

oo 

85 

35 

OO 

10 

12 

35 

OO 

36 

25 

£l  O 

13 

OO 

R^ 
oo 

47 

14 

oo 

R^ 
oo 

4Q 

i  o 

15 

oo 

R*t 

oo 

OO 

1  C/O 

16 

ov 

1  9fi 

OO 

10 

XVJ 

17 

50 

126 

36 

25 

18 

50 

126 

47 

48 

19 

50 

126 

49 

75 

20 

50 

126 

55 

195 

21 

65 

135 

35 

10 

22 

65 

135 

36 

25 

23 

65 

135 

47 

48 

24 

65 

135 

49 

75 

25 

65 

135 

55 

195 

We  are  interested  in  finding  the  Bayes  estimates  under  the  following  model. 

logit(pjj)  =  9i  +  aj,  i  =  1, . . . ,  n;    j  =  1, 2 
We  use  the  same  notation  of  Section  2.2.  Our  objective  is  to  find  Bayes  estimates  of 


0i  lZN(0,a2) 

aj  ~  Uniform ( — oo,  oo) 


41 


Under  the  above  model  we  find  that,  as  the  main  diagonal  counts  increase,  the  Bayes 
estimates  in  the  first  two  columns  also  increase  for  each  fixed  off  diagonal  counts, 
whereas  they  are  decreasing  in  the  rest  of  the  three  columns.  Our  objective  is  to 
show  any  dependence  of  Bayes  estimates  on  the  main  diagonal  counts.  This  makes  us 
expect  the  Bayes  estimates  to  vary  with  the  main  diagonal  counts.  But  at  the  same 
time  we  do  not  want  these  to  vary  so  much  that  they  are  far  off  from  the  respective 
marginal  ML  estimates.  This  is  because  we  know  from  theory  that  MML  estimates 
are  consistent  and  we  want  to  be  safe  while  estimating  a's  employing  the  Bayes 
procedure.  But  Table  2.11  shows  that  these  estimates  are  far  off  from  their  MML 
estimates  (  which  are  the  same  as  CML  estimates).  We  observe  that  as  we  deviate 
from  the  prior  iV(0,  a),  a  being  the  marginal  ML  estimate  of  the  prior  parameter  a, 
we  go  far  away  from  the  marginal  ML  estimates. 

The  reason  behind  this  could  be  the  following.  The  prior  for  the  subject  param- 
eters 6i  (i  =  1, . . .  ,n)  is  N(0,a2).  Now  a  large  value  of  a2  is  indicative  of  a  huge 
correlation  between  the  repeated  observations.  On  the  other  hand  a  very  low  value 
corresponds  to  a  weak  correlation,  a  zero  value  of  a  being  the  no  correlation.  So  pick- 
ing an  arbitrary  value  for  a2  could  mean  a  wild  guess  on  the  underlying  correlation 
structure.  When  we  are  far  off  from  the  true  underlying  correlation  structure,  Bayes 
estimates  could  end  up  being  very  different  from  the  marginal  ML  estimates.  In  the 
following  table,  the  a- value  for  the  estimates  indicated  by  "c?i  —  o?2"  is  1.  We  also 
estimated  the  item  parameters  using  a  —  0.224,  and  the  corresponding  estimates  are 
denoted  by  "&i2  -  (W  •  These  estimates  indicate  that  they  display  similar  behaviour 
as  obtained  for  the  case  of  a  =  1.  We  find  that  as  the  main  diagonal  elements  increase 
so  do  the  Bayes  estimates  for  the  first  two  columns  and  they  tend  to  decrease  for  the 
rest  of  the  three  columns.  We  also  notice  that  for  the  larger  main  diagonal  counts 
the  Bayes  estimates  usually  tend  to  go  toward  the  marginal  ML  estimates. 


42 


Table  2.11.  CML  and  Bayes  estimates  for  the  artificially  created  tables. 


CML=-1.253 
35  10 

CML=-0.365 
36  25 

CML=0.021 
47  48 

CML=0.426 
49  75 

CML=1.266 
55  195 

8  20 

d\-d2 

=-1.514 

O12-O22 
=-1.732 

d\-d2 
=-0.514 

<*12-a22 

=0.595 

d\-d2 
=0.038 

012-022 
=0.035 

d\-d2 
=0.708 

012-022 
=0.796 

d\-d2 
=2.261 

012-022 
=2.543 

15  34 

d\-d2 
=-1.169 

Ol2-Q!22 

=-1.347 

d\-d2 
=-0.420 

«12-  Qf22 

=-0.486 

di-d2 
=0.025 
O12-O22 
=0.033 

d\-d2 
=0.618 
012-022 
=0.709 

d\-d2 
=2.072 

012-  022 
=2.348 

35  85 

=-0.370 
ai2-a22 
=-0.399 

&\-&% 
=-0.266 

012-022 

=-0.308 

oV  d2 
=0.023 

012-022 
=0.020 

d\-d2 
=0.457 

0I2-022 
=0.526 

d\-d2 
=1.653 

012-022 

=1.890 

50  126 

c?i-a2 
=-0.530 

ai2-Q!22 
=-0.610 

d\-d% 
=-0.213 

0I2-022 
=-0.249 

d\-d2 
=0.018 

Ol2"022 

=0.016 

d\-d2 
=0.376 
012-022 
=0.405 

di-d2 
=1.440 

012-022 

=1.654 

65  135 

di-d2 
=-0.351 

O12-O22 

=-0.530 

di-d2 
=-0.184 

O12-O22 
=-0.214 

di-d2 
=0.016 

012-022 
=0.021 

d\-d2 
=0.346 

0I2-022 
=0.398 

d\-d2 
=1.343 

0I2-022 
=1.545 

It  has  been  mentioned  in  the  previous  section  that  with  a  =  0,  the  Bayes  posterior 
mode  is  nothing  but  the  marginal  log  odds  ratio  log[n+1n2+/(n+2ni+)].  For  a  >  0, 
i.e.,  for  positive  correlation  between  the  repeated  responses,  the  above  estimate  is  less 
than  the  subject-specific  log  odds  ratio  (Neuhaus,  Kalbfleisch,  and  Hauck,  1991).  We 
conjecture  that  if  we  consider  tables  having  fixed  marginal  log  odds  ratios  besides 
fixed  off  diagonal  counts,  we  may  observe  some  definite  pattern,  because  this  might 
narrow  down  the  change  in  the  estimaes  (if  any)  due  to  the  change  in  main  diagonal 
counts.  To  see  this  we  created  8  tables,  the  first  4  of  which  have  marginal  log  odds 
ratio  -1.48  and  the  last  4  have  0.79.  It  was  noticed  that  we  did  not  have  much  choices 
for  the  main  diagonal  counts  in  order  to  keep  the  marginal  log  odds  ratio,  as  well 


43 


as  off  diagonal  counts  fixed.  One  can  immediately  see  that  for  the  first  3  tables  the 
total  counts  as  well  as  the  sum  of  the  main  diagonal  counts  are  almost  fixed.  For  this 
reason  we  created  the  fourth  table  where  we  changed  all  the  counts  keeping  only  the 
marginal  log  odds  ratio  fixed.  To  get  more  choice  for  the  main  diagonal  counts  we 
created  the  last  four  tables  starting  with  a  table  having  relatively  large  cell  counts. 

The  results  of  these  eight  tables  are  summarized  in  Table  2.12.  The  value  of  a 
was  taken  to  be  1.  We  can  see  that  the  estimates  are  not  changing  considerably.  This 
is  true  even  for  the  fourth  table  where  we  have  very  different  main  diagonal  counts. 
The  same  phenomenon  is  true  for  the  other  set  of  tables.  As  mentioned  earlier  that 
our  main  interest  is  to  investigate  the  dependence  of  Bayes  estimates  on  the  main 
diagonal  counts.  So  far,  empirically  we  have  noticed  that  the  Bayes  estimates  are  in 
fact  changing  with  the  main  diagonal  counts.  But,  they  do  not  always  poses  the  same 
definite  pattern.  We  do  not  have  concrete  intuitive  reason  for  the  type  of  pattern  we 
see  in  Table  2.11.  We  believe  that  there  is  some  theoretical  relationship  of  the  Bayes 
estimates  with  the  prior  parameter  a.  We  also  believe  that  the  Bayes  estimates  and 
the  marginal  ML  estimate  of  a  are  mathematically  related  through  the  main  diagonal 
counts.  But,  we  have  not  been  able  to  prove  this  formally. 

Table  2.12  Artificially  created  tables  of  matched-pairs  data 
and  associated  Bayes  estimates. 


Table  No. 

Off  Diagonal 

Main  Diagonal 

rfi"  -  d2u 

1 

35 

10 

8 

20 

-1.769 

2 

35 

10 

10 

16 

-1.783 

3 

35 

10 

15 

11 

-1.777 

4 

65 

15 

38 

25 

-1.776 

5 

55 

135 

195 

65 

0.944 

6 

55 

135 

175 

65 

0.972 

7 

55 

135 

170 

70 

0.960 

8 

55 

135 

75 

160 

0.955 

44 


From  the  preceeding  discussion  regarding  estimation  in  the  case  of  matched  pairs 
data,  it  is  seen  that  choosing  a  value  for  the  prior  parameter  a2  plays  an  important 
role  in  the  Bayesian  estimation.  To  get  around  this  problem  of  specifying  an  arbitrary 
value  for  a2  the  most  appropriate  thing  to  do  would  be  to  let  the  data  decide  on  the 
underlying  correlation  structure.  This  could  be  accomplished  by  employing  hierar- 
chical structure  in  the  subject  ability  parameter.  We  discuss  the  issue  of  hierarchical 
Bayesian  estimation  in  the  next  chapter. 


2.6    Uniform  approximation  of  improper  priors  by  proper  priors 

So  far  we  have  considered  a  normal  prior  for  the  subject  ability  parameters  and 
a  flat  prior  for  the  item  difficulty  parameters.  Bayesian  methods  depend  on  the 
prior  specification.  If  the  prior  information  is  vague,  a  non-informative  or  a  flat 
prior  is  a  sensible  one.  Non-informative  (improper)  priors  can  be  approximated  by  a 
certain  class  of  priors.  In  this  section  we  will  extend  the  work  of  Mukhopadhyay  and 
Dasgupta  (1992)  and  Ghosh  and  Mukhopadhyay  (1994)  and  obtain  a  class  of  priors 
to  approximate  improper  priors  for  the  case  of  item  response  models.  We  will  use  the 
notations  of  Section  2.2  for  the  parameters  of  the  models  and  the  likelihood  function. 

Consider  a  class  of  priors  of  the  form 

=  ^(7)  (2-6"1} 

where  c,  a  constant,  is  such  that, 

1=1  — r~r7r  [  — )  da 

We  will  use  the  following  criterion  for  measuring  the  uniform  closeness  of  posteriors 
71^2  (a  |ai)  to  the  posteriors  f(a\x),  f(a\x)  being  the  posterior  of  a  for  a  flat  prior. 

sup  /  \ira2(a\x)  -  f(a\x)\da 


45 


We  want  to  find  sufficient  conditions  under  which 


sup  /  ^^(al*)  —  f(a\x)\ da  — >■  0  as  a 
x  J 


— >  oo. 


Theorem  6. 


A  pair  of  sufficient  conditions  for  (2.6.2)  to  hold  is  given  by 


/  L(x,  a)da  <  oo 
./ft*-1 

/ 


sup 

X 


1  -  7T 


(2.6.2) 


(2.6.3) 


f(a\x)dx  <  g(a)  -»  0  as  a      oo.  (2.6.4) 


Proof. 


Note  that, 


y  |7rffa(a|a:)  -  /(a|x)|da 

J     L(x;a)^7r(f)  L(x;a) 


da 


=  1 


f  L(x;a)-^rTn(a-)da  JL(x;a)da 

L(x;  <*)*{%)        Ljx-a)  fog )  -  7r(f )  +  l) 


/  L(x\a)n{a-)da  JL(x;a)da 


da 


I  YTi  \  /a\j — TFT  {  I  L(x'  a)dot  ~  I  L(x'i  ct)ir(—)da 

J   J  L{x;a)ir(f)da  J  L(x;a)da  \J  Jay 


(l-7r(g))L(»;a) 
/  L(x;  a) da 


da 


£(*;a)ir(f) 
JL(x;aMf)d 


j  /     J  '    ^     -   ■  I 

V  JL(x;a)da 


1  -?r(f )  L(x;a)da 


da 


l-ir(flf)  L(x;a) 


/L(g;a)7r(g)dq 
/L(x;a)7r(f)da  L 


/ 


<7 


/(a|x)rfa 


46 


+  /!-(-) 
J  I  a 

=  2/|i-*(2) 


f(a\x)da 
f(ct\x)da 


<  2g(a)    ->  0  as  <j  — >•  oo. 


We  now  check  the  above  two  conditions  of  the  theorem  for  some  specific  link 
functions.  First  consider  the  logit  link.  Note  that  condition  (2.6.3)  has  already  been 
checked  in  the  previous  section.  So  we  need  to  check  the  second  condition.  We  will 
do  that  in  two  stages.  In  the  first  stage  we  will  find  an  uniform  upper  bound,  uniform 
in  x,  for  the  numerator,  and  in  the  second  stage  we  will  find  a  uniform  lower  bound 
for  the  denominator. 

First  note  that, 

J  \\-e^aV2a2\L{x-ot)doc 


1         t   eZ.^-EiC'-^2/^  /roc       ajeyiC"  \ 

=  ^~2h'^    Ui(i+^)    [Loo mi+e^)dai) x 

k~J     (  roo  eyjaj  \ 

/   Tnr^daj  dO 

r(3)  ^  /  eS>*-2>-*>W ,  *-»  Vfl 


47 


+  ee>) 

k-1  r  eE.'.^-E.C0.-^)2^ 


< 


< 


a2  Jw 

2K    (k  — 

1) 

a2 

2k~2{k  - 

1) 

a2 

2k~2{k  - 

1) 

a2 

2k~2{k  - 

1) 

a2 

2k~2(k  - 

1) 

nt(i+e*o 

r  eEi 


n/ 


(1  +  e*) 


(l  +  e.-^e>)k-ld0 

i  J—c 


oo  e(ti-fc+l)fl-(fl-MO)2/2^ 

1  yy—  (T+e») 


coo  roo 

[J/    eW-^-^lHde  +  J]  /  e(^-fc+i)»-(e-w)2/2<r02^ 

.   7— oo  .■   ./— oo 


(MQ+°-o(<i-fc  +  1))2-Mo 


(a0V2^)n 


n((<,0+tgg)»-Mg)  n((Mo^02)2-,g)-' 
;        2<To        +  e  2<To 


M 


(M(<  oo)  independent  of  x) 


0  as  o  — >  oo 


Next  note  that, 


/  L(x;a)da 


/    /  — 


E,  *^+E,  »«i-E<(»*-w)a/2«rg 

(l  +  e*+a>)ni(l  +  e*) 


/■   eE,'.».-E,(e.-M)2/2a2  eE,^^ 


=  L    ni(i  +  e«o  ~(5/-oon 


(1  +  e6i+ai) 
eyja 


(1  +  e9'+a) 


da  dO 


-Jr«~      IL(l+e*)      "1 9  y-oo  fH 


7fln  Ylimax(l,eei)(l  +  e6')  10 7o    (1  +  it)n 


mox(l,efli)(l  +  eQ) 

oo  _  1 


da 


f  e* 

7ft«  a 


E.^-E^-^)2/2*2. 


max(l,  ee-)(l  +  efl 


d0 


=  7*.  n<mas(l,c*)(l  +  C'0  (0 "  -  yJ )  *> 
jRn  Ui  max(l,  e?*)(l  +  e*)  r(n) 


> 


eE,^.-E,(^-Mo)2/2^  /  r(i)r(i)\ 

7Rn  n,  max(l,  e*)(l  +  e*)  \0  J 


=c,n 

t 

t 


oo  etiB-(0-^o)2/2cl 

*  J-oo  max(l,  e9)(l  +  e°) 

0  eU9-(0-M)2/2al 


dd 


[I- 

I 


oo  eue-(9-^)2/2al 

oo  max(l,ee)(l  +  ee)        Jo    max(l,  e8)(l  +  ee) 


f 

Jo 


dd 


oc      (l  +  ee) 

0  ek0-(0-no)2/2cr* 


d,0  + 


o  eue-{e-no)2l2°l  _       f00  eti0-(e-no)2/2a2Q 

oo  g-(0-Mo)2/2^ 


./o 


'(l  +  e«) 


d0 


d0  + 


[/*0  /"Oo 
/      ek6-^2/2"ode+         e-{6-»o?  12*1-29  dQ 
J-oo  Jo 


=  (ci/2B)n 


e((M+K)2-^)/2<r02    /"°  e-(fl-(M0+fca02)V2<T02^ 
J  — oo 


+ e((M-2ffo2)2-^)/2^2    P  e-(e-(M0-2<r02))2/2(r02^ 
JO 


49 


/*0O 

>  (ci/2n)TTe((/io~2<ro)2""°)/2<ro2  /  e-(e-(w-2<r02))2/2^^ 
=  (c1/2n)(a0v^)nen«'10-2'To2)2-'1o)/2^ 


'fio  -  2cr2' 


which  is  finite  and  independent  of  x. 

We  next  consider  the  log-log  link.  Note  that, 

/       1  -  e~^ia^2a  L(x:ct)da 
<  /^£a2/2a2)L(x;a)da 

2a2  fr{  jRk-i   3  Jr* 


nn  fi-e- 


£  /  c-E,-^  n(l-e-e-'i)1-Zifce"^Ei{*<"M0)a 


d0da 


1 

2^2 


n  / 


x 


fc-i 

n 


dctjdO 


50 


^£,(*-fo)2 


<  A  E  /  e"  TT(i  _  e-e"fl,)1_Iite  ^ 

f    a?eQ<  *a«  da,  +  f°°  afe'^-^-^-^dai 

J-oo  JO 

=  /  e-E,^TT(l-e-e-,')1-I'te^Ei(°'"W)! 


X 


A:-  1 

n 


X 


(EzXu)3  {n-yif 


x 


fe-i 

n 


da. 


dO 


<-oYll  e-^x'ke~e'T\(l-e-e~e')l-x' 


x 


/~e-E,.U.-"-'n(l-e--'-')  " 
<  -i'E  /  e" TT(l  _e-«-')i-.le-^E"'i-")'  x 


dajdO 


=  \  E  /  e-S.^e"9"  TT(1  -  e"6^)1"1" 
nfeE.^+e-E.d-^)^]  d6> 


1  E  («.-M0)2 


=  tzi  /  e-E,^-.TT(l_e-e-.y-,!fce-^E,(^^eEiEjIijet 


fc-1 


j=\ 


k  -  1 


—±  f   e-E,-.^-e"  TT(i  _  c-e-*«)i-»«  x 


< 


2*"2(fc  -  1) 


rr- 


00  -A(e-Mo)2 


OO 


C9 

a2 


C2 


00  (ti_*+i)fl_  1  (e-M)2 


< 


C2 


(MO+^tj)2-^2, 


(a0\/2^)ne       ^T"     +  (cr0\/2^)nen 


2"o 


— —  — >•  0  as  a  — >  00 


51 


Next  observe  that, 
/  L(x;a)da 

JRk~1 


fl"  ./ft*-1 


>  /     /  e 


E.E.^e-E, 


e  ^>  x 


II II (>  ■''•')  ' 


dadO 


n(i  -  e^)e^E'(9'-w,nn(i  -  e"^) 


52 


J Rn  JRk~l 


e  1  x 


I  J 


[     -T  e-"<  -y  6i  ~y  e-"'  ~^Ei(^-w>)2 
=  /    e  e       'e  e     o  x 

/OO 
-oo 

=  c-a  E,  e-2  E, *  -  ±1 JJ  e-2»  E,  ^  y»  -  ^ 
=  /  e-2Ei^e-2Ei^e"^Ei('i"'<o)2TT      ^  J" 


Y(2EiC-ft)B 


=  /  c-2r..e-V2rlV^£^°>2  (r(n))*-1 

>  /   e-2E.e-%-2E.^-^^(g'-w)2     ^  1  

~  JR»  (e2Eie_Si)n(fc-l) 


d0 


d0 


■    J  —  OO 


e  e-(2n(t-l)+2)(-»(ifl 


n 


/ 


0  _2e--^(0-Mo)5 


e  2ao 


JO 


>njfooe-2<-^(,-»,!e-w*-')«)«-d9 
>n/°°'-(2"('-i)+2)^M"^<'"'*),<» 


=  c2 


/ 


00    -A(e-(Mo-2<r02))2  -<»o-Hj)a-»ij 


e  2"o 


53 


"((MO-Zoo)2-'^)  ' 


=  c2{a0V2^)ne 


L 


1  _a7(0-(/,o_2^))2 
e  2ao  d6 


=  c2(a0V27r)ne 


O"0V/27T 
'/io  -  2(7^ 


<70 


which  is  finite  and  independent  of  x. 

Lastly  consider  the  probit  link  model.  For  this  model  the  likelihood  function  is 
given  by, 


n  k—l 


L(0,a\x)   cc   Jill  +  aj)$1_xy(ft  +  (ft  )$1_ae<*  (ft)  x 

«=i  j=\ 


exp  I  -  E 


(ft  -  /^o)5 


2al 


(2.6.5) 


Hence  the  marginal  likelihood  function,  denoted  by  L(x;  a),  is  given  by 


L(x;a)  =  I  L{0,a\x)dO 


(2.6.6) 


We  note  that, 


J    i  |1  -e^^/2a2\L(x-ot)da 

=  ^E/^  «i(ft S **y(ft  +  o*)*1-^^  +  ^"(ft)*1"'"^))  x 


eXp  f-  £  {6i:^)2dB 


2al 


2ag 


x 


n  fc— 1 


ai  g  n  jRn       (ft  +  <*iW~Xii  (ft  + 


'  r°°      11  \ 
/    a)  [J  ^ (9i  +  ctj)®1-^ (ft  +  aj)da3  x 

J'00         i=l  ) 

/  /-oo    "  _  \ 

n  y  n    + ao*i_x<,(^ + 


^  (ft  -  ^o)5 


2(7, 


2 


X 


i(#)=i  v_00i=i 


2a02 


<^E/  n^(ft)^(ft)exp  -e 
ZfJ  j=i-/«"t=i  V  i=i 

n  — +^ — 

sssgtg*-(w«^(*)-p(-g^ 


X 


2(  v/2A)"(fc-1)e^-  (i  +  c-  E,  * 
eE,  E*:,1     (x  +  e-  E, i^)*-1^ 


x 


e^EK^i  +  e-E,*)*-!^ 

er2       v  7h» 


< 


(A:  —  1)2 


A:-l 


< 


a2 

-  l)2fc"1 

a2 

(ft 

-  l)2fc-' 

a2 

(ft 

-  l)2fc-1 

a2 

E^.-E*^ 


^(i  +  e-(fe-i)E^-)^ 


n/_ 


n/ 


oo  (ti_fc+i)e_i^L 


o  d9 


{yj2fn)nk{aQ^)n 


n(("0+K)2-^)  n((*°+gof-*0 

3        2ffo        +e  2ao 


M 


(M(<  oo)  independent  of  x) 


0  as  cr  — >  oo 


Next  we  find  the  lower  bound  of  JRk-i  L(a;  x)doc. 
\  L(a;x)da 

JRk-1 

...         n  A;— 1 


1=1  j=l 


-/,(n*»(«*-"(*))«p(-t^^)« 

II  /      I  $I,J  (0,-  +  a,)*1"**  (0,  +  aAdctjdB 


>  jRn  (ft  *x'*(ft)$1_Xlfc(ft))  exp  (-  £ 


^  (ft  -  Vo¥ 


2al 


x 


2(e,+Qj)  -2(0; +Qj) 

Ln<i/^<IT^ 

^  (ft  -  Mo)2 


fc-i  r 

n 

j=l  L"-uu  i 


=  (1/V^F)B  /ftn  (n  exp  (-  J] 


2al 


fc-i 

n 


oo  e21£ixi3(0i+aj)-2Y,i(l-Xij)(6i+a3) 

n,(l  +  e4ei+4a0 


-da, 


d0 


> 


^  (ft  -  Mo)5 


2a02 


fc-i 

n 

j=i 


oo  e2E,  *<j(**+«i)-2£i(1-ay)(ff<+Qi;) 


yoo  e'^, 
J-oo~[[ 


mox(l,e4fl')(l  +  e4Q0 


d0 


riimax^e^o/Ji  I/O    (1  +  u)^ 


h  (ft  -  Mo)5 


d0 


= /Rn(v^)n  (n*xit(ft)*!"x"(ft))  exp  (-E 


n  (ft  -  Mo)5 


2a02 


e*l>i*   *jj7r(yj)r(n  -  yjy 


^0 


nimax(l,e4«.-)/=A1  V  r(n) 

>  j^(i/v5F>-  (n^Hft)*1-^^))  exp  (-E(g'2~f)2) 

e4^1    W  /T(i)r(m 

n2max(l,e^)/=11  V    r(n)  J 


>  c,  /  (l/v^)B — =- 

•/«b  rii 


***** 


(l  +  e4*-) 


57 


;  ^0  —  

fit  max(l,  e46i) 


de 


Hi  777.ax(l,  e40i 


-d0 


i=i 


0      -4e+4tifl-(fl~Mg)2  /•«>  -8g+4tifl-(8"^)2 


To  d6+  e 


I 

Jo 


5  d0 


»=1  L 


/     e        2ffo   d6+  /  e 

J-oo  Jo 


5  cW 


which  is  finite  and  independent  of  x. 

Hence,  for  all  three  links  we  have  shown  that  a  flat  prior  can  be  uniformly  ap- 
proximated by  any  proper  prior  of  the  form  given  in  (2.6.1). 


2.7    Consistency  of  Marginal  Maximum  Likelihood  Estimator 


In  item  response  literature  the  most  widely  used  estimation  technique  is,  per- 
haps, the  conditional  maximum  likelihood  estimation  procedure.  But,  because  of  its 
dependence  on  the  existence  of  sufficient  statistics  for  nuisance  parameters,  an  alter- 
native approach  is  also  used,  which  is  the  marginal  maximum  likelihood  estimation 
procedure.  The  marginal  maximum  likelihood  estimators  are  known  to  enjoy  the  con- 
sistency property.  But,  to  our  knowledge  the  formal  proof  of  consistency  for  general 
item  response  models  based  on  a  parametric  approach  seems  to  be  lacking.  There  are, 
however,  results  available  on  the  consistency  of  marginal  likelihood  estimators  based 
on  either  nonparametric  or  semiparametric  approach.  De  Leeuw  and  Verhelst  (1986) 
showed,  based  on  a  nonparametric  approach,  that  for  a  generalized  Rasch  model 
the  probability  that  the  marginal  maximum  likelihood  estimator  and  the  conditional 
likelihood  estimator  are  equal  approaches  unity  as  the  number  of  subjects  becomes 
large.  Lindsay,  Clogg  and  Grego  (1991)  used  a  semiparametric  approach  to  arrive 


58 


at  the  same  conclusion  as  De  Leeuw  and  Verhelst  (1986)  but  under  less  stringent 
conditions.  Follman  (1988)  proved  consistency  under  similar  conditions  as  Lindsay 
et  al.  (1991)  through  a  purely  nonparametric  approach.  In  this  section  we  give  a 
formal  proof  of  consistency  of  marginal  maximum  likelihood  estimators  for  general 
item  response  models. 

We  begin  with  the  model  given  in  (1.1.1),  namely, 

py^F^  +  oy).  (2.7.1) 

Then  the  likelihood  function  is, 


L(0,a)  =  n 

i=l 


k-1 


I]  F**(0,  +  a,-)(l  -  F(6i  +  a,))  ~  y  \  F*»W,)(l  -  F(9l)) 


We  assume  a  distribution  characterized  by  a  probability  density  function  n($)  for  the 
subject  ability  parameters  0,'s.  Then  the  marginal  likelihood  for  the  item  parameters 
a  is  given  by, 


m«)  =  n/ 


Jfc-1 


l[FxH9  +  aj)(l-F(9  +  aJ)) 


FB*(0)(l  -F(0))*  X'k Tr^O^dO 


=  Qew(as<,a)  say 

i=l 


where 


f  7T 


(asi,o)  =  | 


fc-i 


FXik(9)(l  -  F(9)Y  Xi\(9)d9 


Hence, 


log  L(a)  =      log  c^ixi,  a) 
Let  Ux,  a)  =  (dl0^X'a\  . . . ,  ^^)'.  Then, 


d\ogL(a)  " 

 —  =  2.^(aJi,a) 


59 


Also,  let 

1  d  log  L(a) 


Sn(a)  = 


n  da 
1  71 


The  MMLE  of  a  is  the  solution  of  the  marginal  likelihood  equations  Sn(a)  =  0.  Let 
olq  be  the  unknown  true  value  of  the  parameter  vector  a,  and  0  be  an  open  subset 
of  Euclidean  (k  —  1)  space,  Ek_1. 

We  now  state  the  following  theorem. 
Theorem  7. 

Assume  the  following  conditions. 

d  log  cn(X,a)      d2logC7r(X,a)  . 

all  exist  and  are  continuous  on  G. 

(2)  The  matrix  S'n(a0)  with  the  (i,j)th  element  given  by 

is  negative  definite  with  probability  tending  to  unity  as  n  ->  oo. 

(3)  5^  (a)  converges  uniformly  in  probability  to  h(a)  in  an  open  neighbour- 
hood about  a0,  where  the  (i,j)th  element  of  h(a)  is  given  by, 

*f^#^l  «.i-i,...,<*-i) 


daiidoij 


and 

'rflog  £v(x,a)' 


(4) 


a0 


da 


=  0 

a0 


60 


Then,  there  exists  a  sequence  {an}  such  that 


5„(a„)  =  0 


(2.7.2) 


with  probability  tending  to  unity  as  n  — >  oo,  and 


an  — *  a0 


(2.7.3) 


where  Pa0  represents  convergence  in  probability  when  the  probability  calculation  is 


based  on  a0.  If  {an}  also  satisfies  (2.7.2)  and  (2.7.3),  then  dn  =  an  with  probability 
tending  to  unity  as  n  — >  oo. 

The  proof  of  this  theorem  can  be  found  in  Foutz  (1977). 

To  apply  the  above  result  in  our  context,  we  need  to  check  the  conditions  under 
which  the  consistency  holds.  First  consider  the  condition  (4).  Note  that, 


/( 


dlog  c*(X,a) 


da 


a=a0 


) 


c,r(x,  a0)dx 


a=a0 


=  0 


61 


Next  consider  condition  (2).  Note  that  by  Strong  Law  of  Large  Numbers  (SLLN), 


lj^d2\ogcn(Xha) 


n 


doiidaj 


Ei 


ot0 


OL=a0/ 


cPlogCr^ja) 


dOLidOLj 


a=a0. 


-/(ao) 


<  0, 


where  wpl  represents  convergence  with  probability  1.  Next  we  consider  the  third 
condition.  Note  that, 

sup    \S'n(a)  -  I(a)\ 

\a—ao\<5 

<    sup    |S»-S>0)|+    sup  |S>0)-/(ao)| 

|q— ao|<<5  |a-ao|<i5 

+    sup    |/(a)-/(a0)|  (2.7.4) 

\a-ao\<6 


Now  the  third  term  of  (2.7.4)  goes  to  0  because  of  continuity  of  1(a).  The  second 
term  of  (2.7.4)  goes  to  0  in  probability  by  the  Weak  Law  of  Large  Numbers  (WLLN). 
So,  it  is  enough  to  show  that  the  first  term  goes  in  probability  to  0  as  n  approaches 
infinity.  This  result  holds  under  certain  conditions  which  we  now  state  in  the  form  of 
a  theorem. 
Theorem  8. 

Suppose  the  following  conditions  hold. 


suP|Q_Q0|<(5 +  a)  -  f(6  +  a0)\<  g(6,  S)   ->  0  as  S  ->  0 

SUP|q-q0|<5 

\f(6  +  a)\<  00(8,6) 
a  _l  „,\     (it a  i      M  ^  _  i a  s\ 


ouH|a-Q0|<5  in"Tu;|  ^  yoyu,  t 
suP|Q_aoM  \f2(6  +  a)-  p(9  +  aQ)\<  9l(9,  S) 

suP|Q_Qo|<(5|/2(0  +  «)|  <g2(d,5) 
suP|a_ao|<(5  \f'(d  +  a)  -  f'(d  +  a0)\<  g3(8,  6) 

suP|Q_ao|<(5|/'(0  +  a)|  <  g4(8,S) 


0  as  6->0 


0  as  6^0 


Then  the  condition  (3)  of  Theorem  7  is  satisfied. 
Proof. 

Note  that  in  this  problem  S'n(a)  has  the  following  form. 


s»  =  - 


n^ClT(x,a)  J  [F{9  +  a)  1 


1  2 


(1 


FXil(9  +  a)  x 


F(0  +  a) 

(1  -  F(9  +  a))1-*1  Fx'2(0)  (1  -  F(9))1~x'2  f(9  +  a)ir{9)d0 


 !_/ 

Cn(x,  a)  J 


+ 


1-X 


ti 


FXil{9  +  a)  x 


F2(0  +  a)     (l-F(0  +  a))2 
(1  -  F(9  +  a))1'*'1  Fx>2{9)  (1  -  F(9))1~x'2  f(9  +  a)7r(0)d0 
A^ii  1  — 


^r(x,a)  y 


Fx<1  (0  +  a)  x 


F(0  +  a)     l-F(0  +  a)_ 
(1  -  F(9  +  a))1'*'1  Fx'2(9)  (1  -  F(9))1~x'2  f'(9  +  a)n(9)d9 


c*(x,a) 


J 


l-Xi 


i  1 


F(9  +  a)     1-F(6  +  a) 


FXil  {9  +  a)x 


1  2 


(1  -  F(0  +  a))1-*'1  Fx'2(0)  (1  -  F(0))1_Ai2  f(9  +  a)7r(0)de 


1-^2 


So  the  expression  sup|Q_ao|<(5  \S'n(a)  -  S'n{a0)\  becomes, 


sup  |5;(a)-5;(ao)| 

\a-a0\<6 


1  " 

=  -E 


sup 

|a-ao|<<5 


A' 


!  1 


1-X 


FXil  (9  +  a) 


F(9  +  a)     l-F{9  +  a) 
(1  -  F(6>  +  a))1-*'1  FA'i2(0)  (1  -  F(9))l~Xi2  f(9  +  a)n(9)d9 
Xn  1  —  Xn 


—  / 


F(0  +  ao)     l-F(0  +  ao) 


FAll(0  +  ao)  x 


(1  -  F(0  +  ao))1"*"  FA'«2(0)  (1  -  F{6)f-Xa  f(9  +  ao)7r(0)d0  j 


G3 


+ 


 !_/ 

cn(x,a)  J 


X{\  1  —  Xn 


FXn  {6  +  a)x 


F2(6  +  a)     (l-F(0  +  a))2_ 
(1  -  F(6  +  a))1-*11  FXi2{9)  (1  -  F(9))l~x'2  f2{6  +  a)n(6)d9 
Xn  1  —  Xn 


an)  J 


cn(x,a0) 


Fx"(9  +  a0)  x 


F*(9  +  a0)     (l-F(0  +  ao))2. 
(1  -  F{9  +  an))1"*'1  Fx»{6)  (1  -  F(9))l~Xa  f2(9  +  <*)ir(0)<l0 
Xn  1  —  Xn 


+ 


F*"(0  +  a)  x 


F(0  +  a)     l-F(0  +  a)_ 
(1  -  F(6  +  a))1_A'u  FA'2 (0)  (1  -  F{6))l-Xi2  f'(9  +  a)n{9)d9 
Xn  1  —  Xn 


5.  an)  * 


c^z.an) 


FXil(6  +  a0)  x 


F(0  +  ao)     l-F(0  +  ao) 
(1  -  F(6  +  aj,))1"*1  Fx'2(0)  (1  -  F(9))1~Xi2  f'(6  +  a0)n(9)d9 
Xn  1  —  Xji 


+ 


J  1 

[/[ 

F(0  +  ao)     l-F(0  +  ao) 


FAil(0  +  ao)  x 


(1  -  F(6  +  a0))l-Xil  Fx'2(9)  (1  -  F(6))l-*a  f(9  +  a0)n(6)d6 


1 


cl(x,a) 


Xn 


l-Xa 


F(0  +  a)     l-F(0  +  a) 


FXil(9  +  a)  x 


(1  -  F(9  +  a))l~Xil  Fx'2(9)  (1  -  F(0))1_A"  f(9  +  a)n{9)d9 


\-xi2 


(2.7.6) 


Now  consider  the  first  pair  of  terms  in  (2.7.6). 

Xn  1  —  Xn 


1  n 

—  2_]  sup 

n  i=l  \a-a0\<6 


1  / 

^(x,a)  7 


F(0  +  a)     1  -  F(0  +  a) 
(1  -  F(9  +  a))1"**1  FA<2(0)  (1  -  Ftf))1-*"  f(9  +  a)7r(0)d0 


FXn  (0  +  a)  x 


—  f 

an)  J 


X 


a 


l-X 


a 


F(0  +  a0)     1  -  F(0  +  a0) 


FXil(0  +  ao)  x 


(1  -  F(9  +  a0))l-Xl1  FA''2(0)  (1  -  F(0))1_A'2  /(0  +  ao)7r(0)d0 


1  2 

<  -  £  sup 

n  i=i  |a-ao|<<5 


'    >  *    )  / 


X 


il 


l-X 


n  2 


F(0  +  a)     l-F(0  +  a) 


FAl1 (0  +  a)  (1  -  F(0  +  a))1-^1  FA<2(0)  (1  -  F(0))1-Ai2  /(0  +  a)?r(0)d0 


1  " 

sup 

*»  i=1  |a-a0|<<5 


1 


c%(x,a0) 


x 


/ 


X 


1 1 


t2 


F(0  +  a)     l-F(0  +  a) 


X: 


1-Xn 


F(9  +  a0)     1  -  F(0  +  ao) 


x 


FXil{9  +  a)  (1  -  F(0  +  a))l"Aa  FA'2(0)  (1  -  F(0))1_A'2  /(0  +  a)7r(0)d0 


1  " 
H —  ^  sup 

W  i=l  |a-ao|<(5 


—  / 


c^(i,a0) 


X 


il 


1-Xa 


F(0  +  ao)     l-F(0  +  ao) 


(l-F(0  +  a)) 


(FA'!l  (0  +  a)  -  FA"  (0  +  ao))  FAi2(0)  (1  -  F(0))1_Ai2  /(0  +  a)7r(0)d0 


1  n 

sup 

n  i=i  |a-ao|<i 


an)  / 


c^(aj,a0) 


X. 


l-X, 


1 1 


F(9  +  a0)     l-F(0  +  ao). 


FAil(0  +  ao)  x 


((1  -  F(0  +  a))1^'*1  -  (1  -  F(0  +  an))1"*"1)  FA'2(0)  x 


(1  -F(0))1-Ai2/(0  +  a)7r(0)d0 


1  n 

71  i=l  \a-a0\<6 


1  / 

6^(05,  a0)  J 


Xu 


T2 


F(0  +  a0)     1  -  F(0  +  an) 
(1  -  F(0  +  ao))1-*'1  FXil  (9  +  a0)  (/(0  +  a)  -  /(0  +  a0))  x 


x 


FAi2(0)(l  -  F(9))l-Xi>  n(9)d9 


(2.7.7) 


65 


Suppose  sup|Q_ao|<(5 +  a)  -  f(9  +  a0)\  is  bounded  and  goes  to  0  as  6  — >  0.  In 
other  words  if  sup|Q_Qo|<(5  \f(9  +  a)  -  f(9  +  a0)|  <  g(S),  where  g(5)  -»■  0  as  8  ->  0, 
then  the  last  term  of  (2.7.7)  converges  in  probability  to  0,  i.e., 


1  n        1  r 

-Y — - —  / 

n  ~[  ^(x,^)  J 


1-X. 


ti 


F{9  +  a0)     l-F(9  +  aQ) 


FXii  (9  +  oq)  x 


(1  -  F{6  +  a0))l-Xil  sup 

|q-qo|<<5 


f(9  +  a)-  f(9  +  a0) 


FXi2(9)  (1  -  F(0))1_A'2  ir(9)d9 


l-Xi 


^  9(6)E 


cw(X,a0) 


1-X, 


F{9  +  a0)     l-F(9  +  a0) 


FXl(9  +  a0)  x 


(1  -  F(9  +  a0)y-Xl  Fx*(9)  (1  -  F(9)Y~A'2  n(9)d9 


i-x2 


0    as  5  ->  0 


Note  that  the  last  step  follows  from  the  Lebesgue  dominated  convergence  theorem. 
Now  the  fourth  term  of  (2.7.7) 


1  "        i  f 

<-T — - —  / 

"  n  f^i  cT(x,a0)  J 


X, 


1-X, 


sup    \f{0  +  a)\n{6)x 

|q-qo|<<J 


f(9  +  a0)     l-F(9  +  a0) 
((1  -  F(9  +  a0  -  S)Y~X'1  -  (1  -  F(9  +  a0))l~x^)  Fx'2(9)  (1  -  F(0))1~Xa  d9 


1  / 


X, 


1-X, 


[F(9  +  a0)     l-F(9  +  a0)_ 


Mtt(6)  x 


((1  -  F(9  +  a0-  S)y-Xl  -  (1  -  F(9  +  a0))l~Xl)  Fx*{9)  (1  -  F(9))l~X2  d9 


0    as  6  ->•  0, 


66 


provided  sup|Q_Qo|<(5  \f(9  +  a)\  <  M. 

Similarly  the  third  term  of  (2.7.7)  converges  in  probability  to  0  as  n  —>  oo  and  5  — >  0 
by  boundedness  of  f(9  +  a)  and  by  continuity  of  F(-)  after  invoking  the  Lebesgue 
donimated  convergence  theorem.  Proceeding  exactly  as  above,  the  boundedness  of 
f(9  +  a)  ensures  convergence  of  first  and  second  terms  of  (2.7.7)  to  0  in  probability. 
To  show  the  probability  convergence  of  the  second  pair  of  terms  in  (2.7.6)  we  break 
up  the  terms  in  consideration  exactly  as  we  did  for  the  first  pair  of  terms  in  (2.7.6). 
This  gives  us  the  following  break  up. 


1  " 

n  i=l  |o-ao|<4 


—7*—/ 
c-n(x,  a)  J 


An  1  —  Xn 


F2(9  +  a)     (l-F(0  +  a))2 
(1  -  F(6  +  a))1'*"  Fx»{9)  (1  -  F{0))x-Xa  f2(9  +  a)v(9)d9 


FXil{9  +  a)  x 


-J— I 

^(*,a0)  J 


+ 


F*(9  +  a0)     (l-F(0  +  ao))2 


Fx'i(6  +  a0)  x 


(1  -  F{6  +  ao))1-*1  FXa(0)  (1  -  F(6))l-Xi2  f{9  +  ao)n(0)dB 


1  " 
<  -  S  sup 

n  i=l  |a-a0|<<5 


— i  \  )  [ 


+ 


1-Xa 


F2(9  +  a)     (l-F(9  +  a))2 


x 


Fx'1  (9  +  a)  (1  -  F(9  +  a))l~Xil  Fx«(9)  (1  -  F^))1-*"  f2(9  +  a)x(9)d9 


1  71 
H —  SUP 

n  i=\  |q-qo|<<5 


1 


/ 


Xn 


c„(x,a0) 

1-Xa 


+ 


_F2(9  +  a)     (l-F(9  +  a))2 


Xn 


1-Xa 


F2(9  +  a0)     (l-F(9  +  a0))\ 


x 


FXil  (9  +  a)  (1  -  F(9  +  q))1-a"  Fx»{9)  (1  -  F(9))1~Xa  f{9  +  a)n(9)d9 


1  - 

n  i=\  |a-a0|«5 


^(x,  a0)  V 


l-Xa 


F2(9  +  a0)     (l-F(0  +  ao))2 


67 


(1  -  F(0  +  a))1"*"  (FXil(e  +  a)-FXil(6  +  a0))  x 


F*'2(0)  (1  -  Fie))1-**2  f2(9  +  a)n(9)d9 


1  71 

+-2J  sup 

n  i=l  |a-a0|<<5 


I— / 


c^(x,o;o) 


F*(9  +  a0)     (l-F(0  +  a„))2 


FXil(9  +  a0)  x 


((1  _  F{0  +  a))i-Jf«  _  (i  _  F{0  +  aft))1"*1)  x 


F*»(0)  (1  -  F(0))1_A<2  /2(0  +  a)n(0)d9 


,2/ 


1  ™ 

n  i=i  |q-qo|<<5 


E,  an)  •/ 


+ 


C7r(x,a0)J  |F2(0  +  ao)     (1  -  F(0  +  a0))2 
(1  -  F(9  +  ao))1_X"  F*»  (5  +  a0)  (/2(0  +  a)  -  /2(#  +  <*o))  x 

Fx"{9)  [\  -  F{9)f-Xi2  ir{9)d9 


(2.7.8) 


Under  the  assumption  that  sup|Q_ao|<(5  |/2(0+a)-/2(0+o:o)|  is  bounded  by  a  function 
not  depending  upon  a  and  goes  to  0  as  8  ->  0,  the  last  term  of  (2.7.8)  approaches  0 
in  probability  as  n  approaches  oo  and  8  goes  to  0.  All  other  terms  in  (2.7.8)  converge 
in  probability  to  0  provided  f2(9  +  a)  is  uniformly  bounded  by  possibly  a  function 
independent  of  a.  The  conditions  needed  for  probability  convergence  of  the  third  pair 
of  terms  in  (2.7.6)  are  : 


(i)  sup    \f'(9  +  a)-f'(9  +  a0)\  <g3(9,8)^0  as  8^0 

|o-q0|<<5 


and 


[ii]  sup    \f'(9  +  a)\<g4(9,5) 

|q-q0|<<5 


68 


The  final  pair  of  terms  in  (2.7.6)  can  be  shown  to  converge  in  probability  to  0  by 
factoring  it  out  in  the  form  of  (a +b)  (a  —  b)  and  applying  the  techniques  and  conditions 
used  for  the  first  pair  of  terms  in  (2.7.6).  This  completes  the  proof  of  the  theorem. 

We  now  check  the  conditions  described  in  (2.7.5)  for  some  commonly  used  link 
functions,  viz.,  logit,  probit  and  log-log  links.  First  consider  the  logit  link.  Note  that 
for  this  link  function, 

and  hence, 

/(»)  =  F(x)(l-F(x)) 

<  1 

\f'(x)\   =  \f(x)(l-F(x))-F(x)f(x)\ 
=  \F(x)(l-F(x))(l-2F(x))\ 

<  1 

f"{x)   =   F(x)(l-  F(x))[l-6F(x)  +  6F2(x)] 

<  F(x)(l-F(x))[l-6F(x)  +  6F(x)] 

<  1 

Hence  all  the  conditions  listed  in  (2.7.5)  are  satisfied.  So  for  the  logit  link  the  marginal 
maximum  likelihood  estimator  of  the  item  difficulty  parameter  is  consistent.  Next 
consider  the  probit  link.  For  this  link, 

F(x)  =  *(*), 

$  being  the  standard  normal  distribution  function.  Then, 

sup    |/(0  +  a)-/(0  +  ao)| 

|a-ao|<<5 


69 


(0+a)2/2  _  p_(0+ao)2/2| 


=      sup    \e  v     '  '   -  e 

|q-qo|<<5 

<  sup    I  -  {0  +  a0){a-a0)e-{e+ao)2/2\ 

\a-ao\<6 

<  5\6  +  a0\  (2.7.9) 
=  g(6,6)  (say) 


Note  that  by  (2.7.9)  g(9, 6)  ->  0  as  6  -+  0.  Next, 


sup    |/(0  +  a)| 

|q-qoI<* 


=        SUp  |e-C+Q)2/2| 
|q-qo|<5 

<    1    (since  e-1'  <  1,  Vx). 


Also, 


sup    \f(9  +  a)-f2(6  +  a0)\ 

\a—ao\<6 

=     sup  |e-(«+«)2_e-(*+«o)a| 

|a-ao|<<J 

<  sup    | -2(^  +  a0)(a-ao)e-(e+ao)2| 
|a— oo|<4 

<  2£|0  +  ao|  (2.7.10) 
— ►  0    as  8  ->  0 


Next,  it  is  straightforward  to  see  that, 


sup    |/2(0  +  c*)| 

\a-a0\<6 


70 

=      sup  \e~^\ 

\a-a0\<6 
<  1 

Lastly  for  probit  link  fix)  becomes, 

f'(x)  =  -ze-*2/2 

So, 

sup    \f'(9  +  a)-f'(9  +  a0)\ 

\a-ao\<S 

=      sup    \(6  +  a0)(p(e  +  aQ)  -  (9  +  a)(()(e  +  a)\ 

\a-ao\<6 

=      sup    \  -  [a  -  ato)<l>(8  +  a0)  -  (0  +  a){<f>(0  +  a)  -  4(0  +  ato))\ 

|o-ao|<5 

<  50(0 +  a0)  +  5\6  +  a0\(t)(9  +  a0)    sup    |0  +  a| 

|a-ao|<<5 

=   6(j){d  +  ao)  +  8\6  +  a0\(t>(6  +  a0)    sup    |0  +  a0  +  a  -  a0| 

|a-Qo|<<5 

<  50(0  +  o0)  +  6\9  +  a0\<f){e  +  a0){6  +\6  +  a0\) 
— >0    as  <5->0 


Also  note  that, 

sup    \f'(0  +  a)\ 

|q-qo|<(5 

=     sup    |  -  (d  +  a)(j){d  +  a)\ 

\a—ao\<6 

<  sup    \9  +  a\ 

|q-qo|<<J 

=      sup    \9  +  a0  +  a  -  a0\ 

\a-ao\<6 

<  S\9  +  a0\ 


71 


0    as  <5->0 


We  now  investigate  whether  the  condition  (2.7.5)  is  satisfied  for  the  log-log  link. 
In  this  case, 

F(x)  =  exp  (—  exp(— x)) 

Then, 


sup    \f(9  +  a)-f(0  +  a0)\ 

\a-ao\<6 


sup    |e-(*+«)e-«-(9+o)  _  e-(«+«o)e-e-^o)  | 

|q-qo|<<5 


sup    I (e 

|q-qo|<<5 


—  e 


<      sup  e 

|q-qo|<<5 


_e-(0+ao) 


<e-(0+ao-5)  _  e-(e+a0)^j 


+    sup  e-(*+°o-*) 

|q-qo|<<5 


0 


_e-(e+a+,S)  _e-(0+ao) 


)l 


<  e-e_(9+Qo)e-(e+ao)|e<5  _  1|  +  e-(8+<*0-5)  e-(0+ao+S)  _  e~(e+a0) 

<  e_e-(»+ao)e_(0+Qo)|e(5  _  1j  +  e_(fl+Qo_(5)e_(0+ao)|e(5  _ 

=   <?(M)  (say) 


(2.7.11) 


Note  that  by  (2.7.11)  g(9,  S)  ->  0  as  5  ->■  0.  Next, 


sup         +  a) 

|q-qo|<<5 


sup  |e-^e-e"('+Q)| 

|q-qo|<<5 


<     e-(*+ao-*)     gup  |e-e-<^)| 
\a-a0\<6 


72 


<  e-(e+aQ-6)e-e-«>+°0+6) 

<  1    (since  xe~x  <  1,  Vx  >  0). 


Also, 

sup    \f(e  +  a)-f2(9  +  a0)\ 

\a-ao\<S 

=        SUp  |e-2(fl+a)e-2e-(^)_e-2(e+a0)e-2e-(^o)| 
|a-ao|<5 

=        SUp       (e-2^)  -  e-2(«+«o))  e-2e-<«+°0>  +  e-2{e+Q)  /  -2e-C+«)  _  e_2e-(*+-o)\  I 
|a-a0|«5  V  '  V  71 

<     e_2e-(f+«o)  |e-2(0+ao-<5)  _  e-2(0+ao)  |  +  e-2(0+ao-<5)  |e-2e-<9+ao+*)  _  e_2e-<«+°o>  | 

— >  0    as  5  ->  0 


Also,  it  is  evident  that, 


sup    \f2(6  +  a) 

\a-a0\<8 


SUp  |e-2(«+«)e-2e-(,+a>| 
|o-qo|<5 


<      sup  \(e-2^\ 

|q-qo|<<5 

=        SUp     |(e-2(e+Q'  _  e-2(«+«°)  +  e-2(*+«oA 
|a-Q0|«5IV  ' 


<  sup 

|q-qo|<<5 


e-2(0+a)  _  e-2(8+a0) 


+  |e-2(0+oo)| 


<     |e-2(0+Qo-5)  _  e-2(fl+Q0)|  +  |e-2(tf+«o)j 

— ►  0    as  8  ->  0 


73 


Lastly  for  log-log  link  f'(x)  becomes, 


f'(x)=e-xe-e-x(e-x-l) 


So, 


sup    \f'(9  +  a)-f>(0  +  ao)\ 

\a-ao\<8 


<  sup 

\a—ao\<6 


e_e-(*+a0>  ^-2(6+a)  _  e-2{e+ao^+e-2{6+a)  (g- 


)! 


+  sup 

|q-qo|<<5 


e_e-(e+a0)  ^-(fl+ao)  _  g-(fl+«)j  +  g-(*+«)  ^e-e-C+ao)  _  e_e-(»+a)j| 


(2.7.12) 


By  the  previous  calculation  it  can  be  shown  that  the  expression  in  (2.7.12)  is  bounded 
and  goes  to  0  as  6  — »  0.  Also  note  that, 


sup    \f'(9  +  a)\ 

\a-a0\<6 


SUp  \e-^a)e-e-^  _  e-(9+a)e-e-C-H.)| 
|q-oo|<<5 


<      sup  \e 


-2(e+a0-.5)^-e-(9+Q)|e2<5  _  £-2S 


\a-ao\<6 

<  e-2(0+a0-<J)e-e-«'+Qo+^4^e4*2 

— >0    as  i->0 


This  shows  that  the  marginal  ML  estimators  are  consistent. 

In  this  chapter  we  mainly  discussed  different  issues  of  one-parameter  item  response 
models.  We  showed  consistency  of  the  marginal  ML  estimates.  We  obtained  sufficient 
conditions  under  which  improper  priors  can  be  uniformly  approximated  by  proper 
priors.  We  developed  regular  Bayesian  procedure  to  estimate  item  parameters  in 
one-parameter  models.  We  also  discussed  the  dependence  of  Bayes  estimates  on  the 


74 


standard  deviation  of  the  prior  for  subject  parameters.  Our  recommendation  was  to 
use  a  hierarchical  Bayes  procedure  in  such  cases.  In  the  next  chapter  we  will  discuss 
the  issues  of  hierarchical  Bayes  estimation  procedure  for  one-parameter  item  response 
models. 


CHAPTER  3 


HIERARCHICAL  BAYESIAN  ANALYSIS  OF  ITEM  RESPONSE  MODELS  FOR 

BINARY  DATA 

3.1  Introduction 

In  the  previous  chapter,  we  provided  a  subjective  Bayes  analysis  of  item  response 
models.  With  a  normal  prior  for  the  this  needed  specification  of  the  prior  mean 
and  the  prior  variance.  As  noted  already,  in  the  previous  chapter,  the  posterior 
distribution  of  the  contrasts  remains  invariant  under  specification  of  the  prior  mean 
H-  Hence,  assigning  a  specific  value  to  [i  does  not  affect  inference  for  the  contrasts. 
However,  the  choice  of  a2,  the  prior  variance,  does  affect  the  Bayesian  inference. 

There  are  situations,  however,  when  there  is  little  or  vague  prior  information  about 
a2.  In  such  cases,  assigning  a  specific  value  to  a2  may  highly  bias  the  inference.  In 
such  situations,  it  may  be  worthwhile  puting  a  diffuse  or  near  diffuse  prior  on  a2,  and 
carry  out  the  Bayesian  analysis  based  on  such  hierarchical  priors. 

In  Section  3.2  we  discuss  hierarchical  Bayesian  models  and  investigate  propriety 
of  the  resulting  posteriors.  In  particular,  we  state  and  prove  a  theorem  concerning 
propriety  of  posteriors.  We  also  point  out  how  intuitively  appealing  hierarchical 
models  can  lead  to  improper  posteriors 

Section  3.3  deals  with  the  implementation  of  hierarchical  Bayes  methods  for  gen- 
eral one-parameter  item  response  models.  Section  3.4  contains  an  example  for  the 
data  consisting  of  200  examinees  to  8  questions  in  a  placement  test  in  mathematics. 
The  posterior  means  and  the  standard  errors  of  the  differences  in  the  difficulty  values 

75 


76 


of  these  questions  are  provided  for  the  logit,  probit  and  log-log  links.  We  also  provide 
the  marginal  maximum  likelihood  estimates  of  these  questions  and  compare  them 
with  the  hierarchical  Bayes  (HB)  estimates. 

3.2    Bayes  Procedures  with  Hierarchical  Priors 
Recall  the  likelihood  function  of  (G,a)  as  given  in  (2.2.1),  namely,  the  following. 

L(0,  a\x)  =  ft  I[  [FXH^  +  aJP—tfa  +  a,)} ,  (3.2.1) 

i=lj-i 

In  the  previous  chapter  we  assumed  the  Q{  to  be  iid  N(ii,a2),  a  natural  class 
of  priors  for  ability  9  (Lord  and  Novick,  1968).  In  this  situation  in  the  first  stage, 
suppose  the  9i  are  iid  normal  with  mean  fx  and  variance  a2,  i.e., 

0,-|/i,a2  ~  N(fi,a2)    i  =  l,...,n  (3.2.2) 

and  ctj  is  flat,  i.e., 

a2 ;~  Uniform(-oo, oo)    j  =  l,...,k  (3.2.3) 

The  conventional  hyperprior  for  the  (/i,  a2)  assumes  a  priori  independence  of  fj,  and 
a2,  and 

H  ~  Uniform(-oo,  oo)  (3.2.4) 

and 

r  =  °~2  ~  gamma(^a,  hb)  (3.2.5) 

where  a  >  0  and  b  +  n  -  1  >  0.  [A  random  variable  V  is  said  to  follow  a  gamma(a, 
b)  if  it  has  the  following  probability  density  function: 

ab 


f(v)  =  ^yexp(-au)?;6-1.]  (3.2.6) 


77 


This  allows  the  possibility  of  a  negative  b  also.  However,  in  this  case  the  posterior  be- 
comes improper  because  of  nonidentifiable  likelihood  as  well  as  nonidentifiable  prior. 
The  following  theorem  describes  it  formally. 
Theorem  1. 

Under  the  choice  of  priors  described  in  (3.2.2)-(3.2.5),  the  posterior  f(G,a\x)  is  im- 
proper. 
Proof. 

The  joint  posterior  of  0,  a,  n  and  r  is 

n  k 


f(e,a,fi,r\x)   oc   J]  II  +  ^j)Fl~Xii (ft  +  otj)\  * 

t=i j=i 

(r   n  \  lb 

~2  E(ft  -      1  exp(--ar)r"1  (3.2.7) 

First  integrating  with  respect  to  fx,  one  gets  the  joint  posterior  of  6,  a,  and  r  as 


n  fc 


/(0,a,r|x)   a   n  H  [F*y(ft  +  ai)^1_xy(ft  +  * 


t=i j=i 


expf-^a  +  t^-^)2))-^-1  (3-2.8) 


i=] 


Now  make  the  one-to-one  transformation 

77i  =  ft  +  a*,     t  =  l,...,n 

&  =  <*j  ~ak,     J  =  1,  ...,(*-  1) 

afc  =  afc,  and  r  =  r  (3.2.9) 

Writing  if  =       •  •  ■ ,  ifc),  £  =  (6,  •  •  • ,  ^  =  I  Etn=i  ifc,  the  joint  posterior  of  77, 

£,  a*  and  r  is  given  by 

f(fl,t,ak,r\x)   oc   f[  J]  [i™  fa fa +  $)]  n^^jF1-*"^)  x 
i=lj=l  i=l 


78 


exP  [-'-(a  +  $>  -  f)f)  J  r^"1,  (3.2.10) 

which  does  not  involve  a^.  Hence,  integrating  over  ak  G  (—00,00),  one  finds  that 
the  joint  posterior  of  (77,  £,  ak,  r),  and  accordingly  the  joint  posterior  of  (0,  a,  r),  is 
improper.  This  implies  the  impropriety  of  f(0,a\x). 

The  impropriety  of  the  above  posterior  is  a  consequence  of  the  uniform  distribution 
of  the  location  parameter  fj,  over  (—00,00).  If  instead  one  assumes  \i  to  be  known, 
and  assigns  only  a  distribution  to  r  (or  er2),  the  impropriety  usually  does  not  occur. 
Indeed,  we  have  pointed  out  earlier  in  Chapter  2  that  inference  about  the  contrasts 
a>j  —  am  (1  <  j  ^  m  <  k)  remains  invariant  under  the  choice  of  the  location  parameter 
/i  in  the  general  lacation-scale  family  of  priors  for  the  ^'s.  Hence,  we  take  /j,  =  0,  but 
consider  the  same  prior  for  r  =  o~2  as  given  in  (3.2.5).  In  this  case  the  joint  posterior 
of  6,  a  and  r  is  given  by 

f(0,a,r\x)  oc   fill  +  ai)F1-*«(0i  +  ai)l  x 

i=i j=\ 

exp^a  +  Efl2))^-1  (3-2.11) 

With  the  one-to-one  transformation  given  in  (3.2.9),  one  gets  the  joint  posterior  of 
77,  £,  ak  and  r  given  by 

n  k— 1  n 

f(v,t^k,r\x)  cx  J]  II  kI,J(^  +f;)^1"xy(r?<  +&)]  fi^fa)^1"^*!*)  x 

!=1  j  =  l  t  =  l 


exp   --(a  +  £(r*  -  ak)2)   r^"1.  (3.2.12) 


i=l 


First  integrating  with  respect  to  ak,  it  follows  from  (3.2.12)  that  the  joint  posterior 
of  77,  £,  and  r  is 


n  ifc-l 


/(r7,^,r|x)    a        II        'fa  +  ^"^fa  +  &)]  fl ^fo)^1"*"^)  X 


i=i  i=i  j=i 


79 


exp^--(a  +  5>/i-r/)2))r  2 


(n+h-l)  l 


(3.2.13) 


Now  integrating  with  respect  to  r,  one  gets  the  joint  posterior  of  (77,  £)  given  by, 


n  k  —  l 


»=i  j=i 


i=i 


(n+6-1) 
2 


i=l 


»=1  j=l 


(3.2.14) 


i=i 


Now  using  the  arguments  of  Theorem  3  of  the  previous  chapter,  it  follows  that  for 
the  logit,  probit  and  log-log  links,  the  posterior  of  (rj,  £)  is  proper  at  least  when 
k  =  2,  when  0  <  y3  <  n  for  all  j  =  1,. . .  ,k  and  U  =  1  for  all  2  =  1, . . . ,  n.  We 
conjecture  that  for  an  arbitrary  A;,  except  when  xn  =  •  ■  •  =  =  0  for  at  least  one 
i  or  Xij  =  •  •  •  =  xnj  =  0  for  at  least  one  j,  the  posterior  of  (0,  a)  given  x  is  proper 
under  the  hierarchical  model  where  /i  is  fixed  and  a2  has  the  inverse  gamma  prior. 

Suppose  now  we  use  a  hierarchical  prior  for  a.  More  specifically,  it  is  assumed  a 
priori  that  0  and  a  are  independent.  We  assume  a  prior  7rx(0)  for  0.  Also,  assume  that 
conditional  on  (£,  r2),  a3  are  iid  iV((,r2),  where  C  and  r2  are  marginally  independent 
with 

C  ~  Uniform(-oo,  00)  (3.2.15) 

and 


-2  <l  *n 

r  =  T     ~  gamma(-a,  -6) 


(3.2.16) 


80 


where  a  >  0  and  b  +  k  -  1  >  0.  We  now  state  and  prove  the  following  theorem  which 
establishes  the  propriety  of  the  marginal  posterior  of  interest,  i.e.,  f(ot\x).  Define 


Theorem  2. 

Under  the  choice  of  priors  described  in  (3.2.15)-(3.2.16)  for  0  and  a  and  if  1  <  yj  < 
(n  -  1)  for  all  j  =  1, . . . ,  k,  the  posterior  f(ct\x)  is  proper  if  7Ti(0)  has  finite  moment 
generating  function  when 

(i)  F(x)  =  [1  +  exp(-x)]-1,  (ii)  F(x)  =  where  $(•)  is  the  standard  normal 

distribution  function,  and  (iii)  F(x)  =  exp(—  exp(— x)). 

Proof. 

From  (3.2.1)  the  likelihood  function  for  any  unspecified  distribution  function  F(-)  is 
given  by, 

L(0,a\x)  =  f[  f[  [F^{9i  +  a^F1-^  +  a,-)]  ,  (3.2.17) 
t=ij=i 

Under  the  given  prior  the  joint  posterior  of  0,  a,  £  and  r  is  given  by, 


t{  —  X/j=l  xij  and  t/j  —  £3"—  i 


a,    r|x)  oc  L(0,  a|x)  exp 


(3.2.18) 


Integrating  (3.2.18)  with  respect  to  C  we  get, 


7^(0) 


(3.2.19) 


Finally,  integrating  (3.2.19)  with  respect  to  r  we  get, 


A' 


/(0,a|a:)   a   L(0,a\x)x   a  +  ^(a,-a)2 


<   L(0,  a\x)a 


(b+k-l) 


m(0) 


(3.2.20) 


81 


Now  following  the  steps  of  the  proof  of  Theorem  4  of  the  previous  chapter,  it  can 
be  shown  that  the  posterior  is  proper  whenever  ir\{0)  has  finite  moment  generating 
function. 

3.3    Implementation  Of  Bayes  Procedures 

Consider  the  general  link  Pij  =  F{9i  +  «,■),  i  =  1, . . . , n  ,  j  =  1, . . .  ,k,  and  the 
prior  g(0,  a)  oc  n"=i  tti(^|M)  c2)7r3(<j2)  IljLi  ^2 ( Both  7r2  and  7r3  can  be  improper 
as  long  as  the  posterior  f(6,  a\x)  remains  proper.  As  indicated  earlier  in  the  previous 
chapter,  we  shall  use  Gibbs  sampling,  introduced  by  Geman  and  Geman  (1984),  to 
generate  samples  from  the  marginal  densities. 

Gibbs  sampling  consists  of  finding  the  conditional  distribution  of  every  parameter 
given  the  remaining  parameters  and  the  data.  In  this  case  the  full  conditionals  are 
given  by 

/(w#o>^tt,*)«n^^  (3.3.1) 

/(a>m(m  /  j),a2,0,x)  oc  f[[FXi'(9i  +  afiF1-*"^  +  aj)]ir2(aj);  (3.3.2) 
f(a2\0,a,x)  oc  nMftlp.ffVsfff2);  (3.3.3) 

i=l 

i  =  1, . . .  ,n  ;  j  =  1, . . . ,  k.  Note  that  the  full  conditional  of  does  not  involve  the 
remaining  dt(l  ^  i),  and  the  full  conditional  of  aj  does  not  involve  the  remaining 

am(m  ^  j). 

We  will  be  using  the  Adaptive  Rejection  sampling  of  Gilks  and  Wild  (1992)  to 
generate  samples  from  the  marginal  densities  provided  the  full  conditionals  are  all 
log  -concave  densities. To  establish  this  log-concavity  of  the  full  conditionals  before  we 
use  the  Adaptive  Rejection  sampling,  we  state  and  prove  the  following  theorem. 


82 

Theorem  3. 

If  F  F  7Ti,  7r2  and  7r3  are  log-concave  densities,  then  the  full  conditionals,  f(6i\-), 

f{aj\-)  and  f{o2\-)  are  all  log-concave. 

Proof. 

To  see  the  log-concavity  of  f{0i\-),  f(ctj\-)  and  /(cr2 1 -)  let  us  write  the  following. 

k 

log/(0i|0i({  ^  i),a2,a,x)  =  ^[^■logF(^+ai)+(l-xtj)logF(^+aj)]+log7r1(^|/x,(72) 

3=1 

(3.3.4) 

If  F,  F  and  7Ti  are  all  log-concave  then  clearly  f(9i\9i(l  ^  i),  a2,  a,  x)  is  log-concave. 
Similarly, 

n 

\ogf{aj\e,am{m  ^  j),a2,x)  =  ^[zy  logFidi+a^+il-xij)  logF(^+ai)]+log7r2(aj) 

i=i 

(3.3.5) 

Hence,  if  F,  F  and  7r2  are  log-concave,  f(a>j\am(m  ^  j),  a2, 0,  x)  is  log-concave.  The 
log-concavity  of  f(a2\0,ct,x)  follows  similarly. 


3.4    An  Example 

The  data  considered  in  this  section  is  the  same  mathematics  placement  test  dataset 
that  we  have  considered  in  the  previous  chapter.  Altogether  there  are  32  questions, 
but  we  consider  only  the  first  8.  Our  interest  lies  in  inference  about  the  difference 
of  the  difficulty  values  of  these  questions,  or  more  specifically  in  the  posterior  means 
and  posterior  s.d.'s  of  a3 ,  -  a8  (j  =  1, . . . ,  7).  Three  different  links,  logit,  probit  and 
log-log  are  considered.  We  consider  the  following  hierarchical  prior. 

0t|a2I^Ar(O,<T2)    i  =  l,...,n 

otj  ~  Uniform(-oo, oo)    j  =  l,...,k 


83 


r  =  a  2  ~  gamma(-a,  -b)  (3.4.1) 

—  — 

Here  a-1  represents  the  scale  parameter  and  b  represents  the  shape  parameter  of 
the  distribution  of  r.  The  mean  and  variance  of  this  distribution  are  b/a  and  2b/ a2 
respectively.  Therefore,  it  can  be  seen  that 


where  cv  denotes  the  coefficient  of  variation  of  the  distribution  of  r.  Hence,  a/b 
represents  a  guess  at  the  location  of  cr2  and  b  measures  the  prior  precision  of  this 
guess.  So,  to  model  the  uncertainty  about  a2  one  should  assign  a  very  small  value  of 
b  including  0.  A  value  of  0  for  b  implies  a  noninformative  prior  for  a2.  Since,  using 
b  =  0  does  not  pose  problem  in  getting  a  proper  posterior  we  decided  to  take  this 
value  of  b.  We  discuss  the  choice  of  a  at  the  end  of  this  section. 

The  estimate  of  a  for  the  logit  link  from  the  marginal  likelihood  is  found  to  be 
0.891  with  standard  error  0.109  and  the  HB  estimate  of  a  is  0.881  with  standard 
error  0.090.  For  the  probit  link  the  estimate  of  o  from  the  marginal  likelihood  is 
given  by  0.511  with  standard  error  0.049,  whereas  the  HB  estimates  of  a  for  this  link 
is  given  by  0.510  with  standard  error  0.043.  The  estimate  of  o  for  the  log-log  link 
obtained  from  the  marginal  likelihood  is  given  by  0.606  with  standard  error  0.072  and 
the  corresponding  HB  estimate  is  0.589  with  standard  error  0.064. 

We  have  noticed  in  the  previous  chapter  that,  when  a  is  close  to  the  value  of  its 
marginal  ML  estimate,  the  Bayes  estimates  are  close  to  the  corresponding  marginal 
ML  estimates.  However,  it  is  reasonable  to  believe  that  if  we  let  the  data  decide  about 
a  suitable  value  for  cr,  rather  than  putting  some  arbitrary  number,  we  will  get  better 
estimates.  This  naturally  leads  to  a  hierarchical  model  where  we  put  a  distribution 
on  a2.  By  doing  so,  we  get  the  hierarchical  Bayes  estimates  of  the  item  parameters. 
These  estimates  are  provided  in  the  Tables  3.1-3.3.  We  can  see  from  these  tables 


84 


that  the  HB  estimates  of  the  item  difficulty  parameters  are  resonably  close  to  their 
marginal  ML  estimates.  This  is  because  the  HB  estimate  of  a  is  very  close  to  its 
marginal  ML  estimate,  and  this  phenomenon  is  true  for  all  three  links.  Also,  we 
checked  the  robustness  of  the  HB  estimates  with  respect  to  the  choice  of  a  for  the 
Gamma  distribution.  The  natural  question  is  how  to  select  a.  The  usual  practice  is  to 
use  the  value  0  for  a  giving  a  noninformative  prior  for  a  which  reflects  the  uncertainty 
in  the  prior  selection.  But,  as  we  mentioned  earlier,  in  our  case  we  could  not  do  that 
for  technical  reason,  because  otherwise  the  posterior  becomes  improper.  As  the  tables 
show,  the  HB  estimates  do  not  change  much  (in  fact,  very  little)  if  we  vary  a  from  1 
to  0.001.  So,  our  recommendation  is  to  use  a  positive  number  close  to  zero. 

Table  3.1  MML  and  HB  estimates  for  logit  link 


/ 

MMLE 

0=1 

a  =  .5 

a  =  0.01 

a  =  0.001 

1 

0.739 

0.753 

0.743 

0.745 

0.749 

(0.179) 

(0.182) 

(0.179) 

(0.181) 

(0.182) 

2 

2.163 

2.199 

2.178 

2.189 

2.182 

(0.236) 

(0.238) 

(0.238) 

(0.241) 

(0.239) 

3 

1.626 

1.651 

1.636 

1.641 

1.640 

(0.207) 

(0.211) 

(0.210) 

(0.207) 

(0.211) 

4 

1.729 

1.754 

1.738 

1.751 

1.744 

(0.212) 

(0.215) 

(0.212) 

(0.214) 

(0.214) 

5 

-1.376 

-1.386 

-1.386 

-1.383 

-1.379 

(0.200) 

(0.202) 

(0.201) 

(0.200) 

(0.204) 

6 

0.245 

0.252 

0.245 

0.253 

0.252 

(0.172) 

(0.174) 

(0.175) 

(0.176) 

(0.172) 

7 

0.220 

0.229 

0.225 

0.228 

0.227 

(0.172) 

(0.173) 

(0.172) 

(0.173) 

(0.175) 

Table  3.2  MML  and  HB  estimates  for  probit  link 


i 

MMLb 

ol 

1 

a  =  1 

a  —  .5 

a  =  0.01 

a  =  0.001 

1 

0.434 

0.453 

0.451 

f\    ,4  0  0 

0.438 

0.434 

(0.102) 

(0.106) 

(0.108) 

(0.106) 

(0.108) 

2 

1.259 

1.284 

1.283 

1.270 

1.267 

(0.122) 

(0.130) 

(0.129) 

(0.130) 

(0.131) 

3 

0.963 

0.987 

0.981 

0.972 

0.968 

(0.112) 

(0.122) 

(0.120) 

(0.120) 

(0.120) 

4 

1.004 

1.024 

1.028 

1.014 

1.007 

(0.112) 

(0.117) 

(0.120) 

(0.117) 

(0.119) 

5 

-0.819 

-0.817 

-0.818 

-0.822 

-0.822 

(0.114) 

(0.115) 

(0.117) 

(0.116) 

(0.114) 

6 

0.138 

0.151 

0.149 

0.144 

0.141 

(0.101) 

(  0.105) 

(0.103) 

(0.104) 

(0.103) 

7 

0.121 

0.137 

0.133 

0.124 

0.122 

(0.100) 

(0.105) 

(0.103) 

(0.103) 

(0.103) 

Table  3.3  MML  and  HB  estimates  for  log-log  link 


i 

MMLE 

-  agB 

a  =  1 

a  =  .5 

a  =  0.01 

a  =  0.001 

1 

0.875 

0.892 

0.850 

0.880 

0.884 

(0.132) 

(0.130) 

(0.134) 

(0.131) 

(0.133) 

2 

2.033 

2.065 

2.049 

2.049 

2.050 

(0.197) 

(0.200) 

(0.195) 

(0.199) 

(0.197) 

3 

1.568 

1.589 

1.582 

1.580 

1.581 

(0.165) 

(0.164) 

(0.167) 

(0.164) 

(0.163) 

4 

1.684 

1.712 

1.697 

1.691 

1.700 

(0.171) 

(0.171) 

(0.170) 

(0.171) 

(0.173) 

5 

-0.471 

-0.466 

-0.470 

-0.470 

-0.469 

(0.110) 

(0.110) 

(0.108) 

(0.106) 

(0.110) 

6 

0.513 

0.528 

0.521 

0.512 

0.520 

(0.120) 

(0.118) 

(0.121) 

(0.121) 

(0.121) 

7 

0.535 

0.545 

0.542 

0.538 

0.542 

(0.121) 

(0.117) 

(0.121) 

(0.120) 

(0.120) 

CHAPTER  4 


A  UNIFORM  BAYESIAN  ANALYSIS  OF  TWO-PARAMETER  ITEM 
RESPONSE  MODELS  FOR  BINARY  DATA 

4.1  Introduction 

As  mentioned  in  the  introduction,  two-parameter  item  response  models  involve 
item  discriminination  parameters  in  addition  to  the  subject  and  item  difficulty  pa- 
rameters. In  its  most  general  form,  the  discriminination  parameters  may  vary  from 
item  to  item.  Two-parameter  item  response  models  are  given  either  by  (1.1.2)  or  by 
(1.1.3),  i.e., 

pij  =  F(7j{0i  +  aj)),  (4.1.1) 

or  alternately  as 

Pij  =  F(ijOi  +  bj),  (4.1.2) 

This  chapter  addresses  the  issue  of  choice  of  priors  for  two-parameter  item  re- 
sponse models.  We  have  discussed  in  Chapters  2  and  3  how  nonidentifiable  posteriors 
can  often  be  improper.  We  shall  note  in  this  chapter  that  even  if  the  posterior  is 
identifiable,  a  choice  of  improper  priors  can  often  lead  to  improper  posteriors. 

In  Section  4.2,  we  examine  critically  the  issue  of  the  choice  of  priors.  Much  of 
the  existing  work  employs  noninformative  priors  for  one  or  more  of  the  set  of  pa- 
rameters {#,},  {atj}  and  {7^}.  This  is  especially  tempting  when  there  is  vague  prior 
information.  However,  this  leads  to  the  possibility  of  improper  posteriors.  This  im- 
propriety is  sometimes  due  to  a  nonidentifiable  posterior.  But,  as  we  show  in  Section 


86 


87 


2,  impropriety  may  result  even  otherwise.  For  example,  it  is  shown  that  the  priors 
employed  by  Albert  (1992)  and  Kim  et  al.  (1994)  lead  to  identifiable  but  improper 
posteriors.  In  this  context,  we  alert  to  the  possibility  that  it  is  not  always  easy  to 
recognize  the  impropriety  unless  one  carries  out  the  full  integration  analytically.  For 
example,  in  the  Markov  Chain  Monte  Carlo  (MC2)  numerical  integration  technique 
used  by  Albert  (1992),  all  the  full  conditionals  are  proper,  and  yet  the  joint  posterior 
may  be  improper.  For  Kim  et  al.  (1994),  the  posterior  modes  are  finite,  but  the  joint 
posterior  may  still  be  improper. 

Our  recommendation,  therefore,  is  to  use  proper  priors  throughout  for  the  9i,aj 
and  7j.  In  Section  4.3,  we  have  proposed  a  general  algorithm  for  implementing  the 
Bayes  procedure  via  MC2  with  proper  priors.  In  the  final  Section  we  have  illustrated 
the  application  of  the  proposed  methodology  with  the  aid  of  a  real  dataset  for  three 
important  link  functions,  namely  the  logit,  probit  and  the  log-log. 


4.2    Choice  Of  Priors 


Let  x  =  (xu,.--,xik,...,xnU...,xnk),  0  =  (0i,...,0„),  a  =  (oi, . . . 7  = 
(71,  •  •  • )  7fc)  and  b  =  (61, ... ,  bk).  For  a  general  link  function  F,  based  on  (4.1.2),  the 
likelihood  function  is  given  by 


n  k 


L(0,  b,  y\x)  =  [J  UlF^hA  +  b^F'-^^fr  +  b3)\ 

i=l j=l 

A  general  class  of  priors  for  (0,  6, 7)  is  given  by 


(4.2.1) 


7r(0,6,7)  = 


H92j(bj) 

;=i 


n^(7j) 


(4.2.2) 


A  basic  question  here  is  the  selection  of  priors  {gH},  {g2j}  and  {g^}.  Much  of 
the  existing  literature  on  this  topic  recommends  using  noninformative  priors  for  one 


88 


or  more  of  the  set  of  parameters  {bj}  and  {7,}.   For  example,  for  the  two- 

parameter  logistic  model,  with  the  formulation  given  in  (4.1.2),  Swaminathan  and 
Gifford  (1985)  recommended  using  flat  priors  for  the  and  the  {bj}  (see  their 
equations  (12)  and  (13),  and  notice  the  difference  with  our  notations).  Albert  (1992) 
recommended  using  fiat  priors  for  the  {bj}  and  {7,}  for  the  two-parameter  probit 
model  with  the  formulation  given  in  (4.1.2).  It  turns  out  that  such  priors  will  lead  to 
improper  posteriors. 

We  begin  with  a  theorem  which  establishes  the  impropriety  of  the  posterior  when 
at  least  one  of  the  jj  is  improper. 
Theorem  1. 

Consider  the  prior  given  in  (4.1.2),  where  at  least  one  of  the  g$j  is  improper.  Then 

the  joint  posterior  of  (0,  b,  7)  given  x  is  improper. 

Proof. 

The  joint  posterior  of  (6,  b,  7)  given  x  is 


n(0,b,'y\x)  oc  L(0,b,y\x) 


i-l 


Tl92j(bj) 

3=1 


n  93j(ij) 

3=1 


n  k 


=  n  n  [i™  m + *3)Fl-xii  hA + 

i=lj=l 


*=i 


x 


3=1 


n  Midi) 
3=1 


(4.2.3) 


Write 


n  k 


/oo  '' 
II  111^(7^  +  b,)F1-*«(7j9i  +  bj)} 
-00  ■  ,  ■  , 


«=i  j=i 

9\i{0i)d9h     (i  =  l,...,n). 


(4.2.4) 


89 


Then 

7r(0,  b\x)  a  (n  n  52i(^)  n  qm-  (4-2.5) 
\t=i  /  j=i  j=i 

For  definiteness,  assume  that  #31  (71)  is  improper.  Now  for  each  fixed  i,  =  1  or  0. 
Hence,  each  integral  I{  contains  a  term  F{^\0i  +  61)  or  F(ji8i  +  bi),  but  not  both.  In 
the  first  case, 

roc  k 

Jo  j=i 

roc  _ 

>  F(h)  /    ni^^T^i  +  b^F'-'HlA  +  b^guMMi  (4.2.6) 
In  the  second  case, 

/,  >  /°  n  \pxii  (ia + &i))^i_a:<j  Mi + w 
■/-°°  3=1 

>  F(bi)  I9  fll^M  +  b^F^HlA  +  bjMgnfflMi  (4.2.7) 

J-°°j=2 

This  shows  that  we  can  get  a  lower  bound  for  n"=i  U  which  is  nonnegative,  and  does 
not  involve  71 .  Integrating  this  lower  bound  with  respect  to  the  marginal  pdf  of  71 
over  (0, 00)  one  gets 

jf    (f{  li^j  93i  (7i)^7i  =  +00 

This  implies  that  the  posterior  7r(6,7|x)  is  improper,  and  hence  n(9,b,y\x)  is  also 
improper. 

Remark  1.  Albert  (1992)  considered  a  special  case  where  F  =  d{  are  iid  N(0, 1) 
and  g2j{bj)<x  1,  g3j(jj)  oc  1  for  j  =  1, . . . ,  A:.  As  a  consequence  of  the  above  theorem, 
it  follows  that  Albert's  prior  leads  to  an  improper  posterior. 

Remark  2.  Note  however,  that  all  the  full  conditionals  are  proper  in  this  case  so  that 


impropriety  of  the  posterior  may  not  necessarily  be  detected  by  the  MC2  technique. 
To  see  this  notice  that 

ir(4|4(J  ±  i),b,7,x)  oc  J[[F*hjOi  +  bi))F1-*«(ij$i  +  bi))]guffl 

which  is  typically  integrable  with  respect  to  0j.  Similarly,  ir(bj\bm(m  ^  j),0, 7,  a:), 
7r(7jl7m(^  7^  j),6-<b,x)  may  all  be  proper.  Thus,  it  is  possible  to  generate  samples 
from  these  proper  full  conditionals  without  detecting  the  impropriety  of  the  posteriors. 
Remark  3.  The  impropriety  of  the  posterior  continues  to  hold  under  Albert's  (1992) 
prior  with  the  parametrization  given  in  (4.1.2).  To  see  this  all  one  needs  to  note  is 
the  one-to-oneness  of  (0,a,7)  with  (0,b,~/). 

Swaminathan  and  Gifford  (1985,  p353)  suggested  that  using  flat  priors  for  0,  a 
and  7,  the  Bayesian  analysis  will  be  equivalent  to  a  likelihood  based  analysis.  But 
as  we  will  see  now,  the  Bayesian  analysis  then  becomes  questionable  because  the 
posterior  is  improper. 

The  above  fact  is  a  consequence  of  a  more  general  result  which  is  proved  below. 
Theorem  2. 

Consider  the  prior  it(0,  a,  7)  =  #(7)  for  0,  a  and  7,  where  g  is  an  arbitrary  positive 
function  of  7.  Then  the  posterior  n(0,  ct,y\x)  is  always  improper. 
Proof. 
First  write 

7r(0,a,7|jc)   oc  L(0,a,i\x)n(0,a,~f) 

=   f[  I[[FXijM0i  +  ai))F1-^(7i(ei  +  aJM-y)  (4.2.8) 

With  the  one-to-one  transformation 


r)i  =  8i  +  ak        (z  =  l,...,n), 


91 


fj  =  Q!j-at       {j  =  1), 

a*  =  afc,  (4.2.9) 
the  joint  posterior  of  17  =  (%,...        £  =       ... ,  £fc_i),  a*  and  7  is  given  by 

i=l  j=l 

ni^^W^^C^MT)-  (4-2.10) 

i=l 

The  above  posterior  does  not  depend  on  a^.  Hence,  integrating  over  a*  which  ranges 
from  (—00,00), 

/oo 
7r(77,£,aifc,7|x)ata;fc  =  +00.  (4.2.11) 
-00 

This  implies  the  impropriety  of  the  joint  posterior  of  (rj,  £,  a^,  7)  and  equivalently  of 
that  of  (0,a,7). 

Kim  et  al.  (1994)  considered  a  prior  for  (0,  a,  7)  which  is  outside  the  class  of 
priors  considered  in  (4.1.1).  However,  their  prior  also  leads  to  an  improper  posterior. 
Kim  et  al.  (1994)  considered  the  logit  link.  As  we  show  now,  the  impropriety  holds 
under  an  arbitrary  link.  We  may  point  out  though  that  Kim  et  al.  (1994)  were 
primarily  interested  in  the  issue  of  joint  versus  marginal  modes.  These  modes  are 
finite  even  though  the  posterior  is  improper. 

Kim  et  al.  consider  the  following  hierarchical  prior: 
8i  (i  =  l,...,n),  a}  (j  =  l,...,k)  and  fa  =  log 7,  (j  =  l,...,k)  are  mutually 
independent,  and 


JV(0,1) 


log  lj  =  Pj      <rj  ~  N(nP,  a2p) 


92 


//Q  and  fip  are  uniform  (—00, 00) 
ra  =  (O-1  ~  Gamma(i/Q,  Aa) 

r0  =  ~  Gamma(^,  X0)  (4.2.12) 

Theorem  3. 

Under  the  choice  of  priors  given  in  (4.2.12)  the  posterior  f(0,a,(3\x)  is  improper. 
Proof. 

Note  that  the  marginal  priors  for  a  and  /3  after  integrating  out  with  respect  to  the 
nuisance  parameters  fj,a,  fip,  ra  and  rp  are  given  by, 


2  ^ 

ir(a\i/a,  AQ)  oc  {  —  4-  Y^ai  ~ 


-(fc+2t/a-l)/2 


ir(0|i/,,  A,)  a  ^  _  +  £(/?,,  _0)*J  (4.2.13) 


Then  the  joint  posterior  of  0,  a  and  /3  is  given  by, 


n  A; 


f(0,a,(3\x)   a   nil  [^(exp^O^  +  ^O^—Xexp^)^  +  «,-))]  x 

i=lj=l 

,  „    r  9    *        1  -(fc+2,Q-D/2 

exp(-^E^)  f  +  E(«i-«)2 

fc  n  -(fe+2^-l)/2 

^  +  E(^-^)2|  (4.2.14) 


Use  the  transformation  Q  =  -  (3k  (j  =  1, . . .  ,k  -  I)  and  let  C  =  fc-1  E*=i  0-  Then 
the  joint  posterior  of  0,  a,  <  =       . . . ,  Ot-i)  and  &  is  given  by 


93 


f(0,a,CPk\x)<x 

n  fc-1 


I]  II  [^I,J(exp(0  +  PkM  +  aj))F1-^(exp(CJ  +  ft)  (ft  +  a3)) 


x 


«=1  J=l 


f[  [F*"(exp(ft)(0i  +  afc))F-x»(exp(A)(ft  +  a*))]  x 


i=i 


exp(-^t^2)|f  +E(«i-«)2 


-(fc+2i/a-l)/2 

X 


2        fc-l  1  (*+2^-l)/2 

r+E(0-c)2  (4.2.15) 


Note  that  for  ft  >  0  (t  =  1, . . . ,  n),  a3  >  0  (j  =  1, . . . ,  k) 

FjexpjCj +         +  F(0) 
F(exp(0  +  &)(ft  +  a;))  "  F(0)' 

Hence,  for  this  set  of  parameter  values  and  ft  <  0,  the  right  hand  side  of  (4.2.15) 


>  f[  n  ^(exp(0  +  ft)(ft  +  a,))  f[  F(exp(ft)(ft  +  ak))  x 

i=lj=l  t=l 

-in  J"  rj  k  )  -(k+2va-l)/2 

exp(-^E^2)  f  + 


>|  ~(*+2^-l)/2 

-  +  E(0-02 
j- 


>  f[  n  F(exp(0)(ft  +  a,-))  f[  F(ft  +  aft)  x 

B  f  k  }  ~{k+2va-l)/2 

«p(-jEflf)  f +E(«i-«)2 


X 


94 


Jt-i 


-{k+2v0-l)/2 


ap  j=i 


(4.2.16) 


The  right  hand  side  of  (4.2.16)  does  not  depend  on  (5k-  Integrating  this  with  respect 
to  fa  over  (—00,0),  it  follows  that 


This  proves  the  theorem. 

The  above  theorems  show  that  noninformative  priors  for  problems  of  this  type 
often  lead  to  improper  posteriors.  We  have  therefore,  used  proper  priors  in  the 
remainder  of  this  chapter,  and  in  the  analysis  of  the  dataset  given  in  the  earlier 
chapter. 


Consider  the  general  link  =  F(jjj]i  +  bj),  i  =  1, . . . ,  n  ,  j  =  1, . . . ,  k,  and  the 
prior  g(rj,  6, 7)  cx  n?=i  9i(Vi)  14%  1  52(^)53(7^)-  Due  to  nonconjugacy  of  the  prior,  the 
posterior  is  analytically  intractable,  and  can  be  found  only  via  numerical  integration. 
Also,  direct  numerical  integration  seems  infeasible  in  this  case  because  of  the  high 
dimensionality  of  the  problem. 

Fortunately,  the  integration  task  has  become  easier  due  to  the  advent  of  the  so- 
phisticated Markov  Chain  Monte  Carlo  (MC2)  numerical  integration  techniques.  We 
shall  use,  as  before,  Gibbs  sampling  for  integration  purposes.  In  this  case  the  full 
conditionals  are  given  by 


f(Vi\vi(l  ^  i),  b,j,x)  a  Y[{FX"  (7jVl  +  b^F1-^^  +  b^M*);  (4.3.1) 


M-|6m(m^  j),i7>7,a;)  oc  Y[[F*» (1]Vi  +  b^F1^  (7^  +  (4-3.2) 


4.3    Implementation  Of  Bayes  Procedures 


11 


95 

and 

f{lj\l.{s±j),ri,b,x)  ex  fllF'Hjjru  +  b^F'-^ijjrH  +  b^Mjj);  (4.3.3) 

i  =  1, . . . ,  n  ;  j  —  1, . . . ,  k.  Note  that  the  full  conditional  of  r/j  does  not  involve  the 
remaining  rji  (I  ^  i).  Similar,  the  full  conditional  of  bj  does  not  involve  the  remaining 
bm(m  7^  j).  A  similar  remark  holds  for  the  full  conditional  of  7,-.  Notice,  however 
that  the  full  conditionals  are  typically  non-standard  densities  from  which  it  is  not 
possible  to  draw  samples  directly.  The  general  procedure  for  generating  samples  in 
such  cases  is  to  use  the  Metropolis-Hastings  accept-reject  algorithm.  If,  however,  F, 
F,  0i,  #2  and  g3  are  log-concave,  the  full  conditionals  /(t/;|-),  f(bj\-)  and  f(jj\-)  are 
all  log-concave.  We  can  then  use  the  Adaptive  Rejection  sampling  of  Gilks  and  Wild 
(1992). 
Theorem  4. 

If  F  is  an  IFR  distribution  function  and  g\ ,  g2  and  g%  are  log-concave  densities  then 

the  full  conditionals,  f(r)i\-),  f(bj\-)  and  /(Tj I ")  are  aH  log-concave. 

Proof. 

To  see  the  log-concavity  of  /(r/j|-),  f{bj\-)  and  /(t,|-)>  we  write 

A:-l 

log /(/?i|77Z(/  ^  0,6,7,*)  =  ^[xijlogF(7j7?i+6J  +  (l-^)logF(7i7/i+6i)]+log5i(77i) 

i=i 

(4.3.4) 

If  F,  F  and  gx  are  all  log-concave  then  clearly  f{rji\r]i(l  /  i),b,~f,x)  is  log-concave. 
Similarly, 

n 

log  f(bj\v,  bm(m  ^  j),  7,  x)  =  X>0-  log  F^  +  bj)  +  (1  -  aty)  logF^  +  fy)] 

1=1 

+  logp2(^)  (4.3.5) 


96 


and 

n 

log/(7j|»7,fr,7s(s  =  Z[xylogF(7j^+^)  +  (1-^j)log^(7j^+^)]+log^3(7i) 

(4.3.6) 

Hence,  if  F,  F,  g2  and  g3  are  log-concave,  /(fy|7?,6m(m  /  j),7>a)  and  /(7j|7,(s  / 
j),  T7,  6,  x)  are  log-concave.  The  log-concavity  of  F  and  F  is  ensured  since  F  is  a  IFR 
df. 

4.4    An  Example 

In  this  section  we  consider  the  same  placement  data  that  we  have  been  considering 
in  the  previous  two  chapters.  Our  interest  lies  in  inference  about  estimating  the  item 
parameters  and  the  discrimination  parameters  ,  or  more  specifically  in  the  posterior 
means  and  posterior  s.d.'s  of  bj  and  fj  (j  —  1,...,7).  Three  different  links,  logit, 
probit  and  log-log  are  considered. 

Suppose  rj  are  iid  normal,  as  is  chosen  in  Swaminathan  and  Gifford  (1985).  Specif- 
ically we  take  N(0, 1)  distributions  for  r\.  The  normality  assumption  for  ability  pa- 
rameters is  convenient  and  may  seem  reasonable  because  previous  studies  may  suggest 
that  the  population  of  abilities  of  subjects  is  bell  shaped  and  the  subjects  taking  the 
test  are  a  random  sample  from  this  population.  If,  on  the  other  hand,  a  test  uses  a 
battery  of  questions  for  the  first  time,  there  may  be  little  (if  any)  information  about 
the  distribution  of  the  question  difficulty  and  discrimination  parameters.  In  that  case, 
a  flat  or  highly  diffuse  prior  may  be  sensible  for  them.  Now  we  have  seen  that  if  we 
take  a  flat  prior  for  both  item  parameters,  then  the  posterior  is  improper.  Now  if 
an  item  bank  is  available,  then  the  normality  assumption  for  the  item  parameters 
appears  to  be  reasonable  (Lord  and  Novick,  1968),  because  one  can  assume  that  the 
items  included  in  the  test  are  coming  from  a  bell  shaped  population  of  items.  So,  we 


97 


take  b  as  iid  normal.  Swaminathan  and  Gifford  (1985)  used  a  chi-distribution  on  the 
discriminating  parameters.  We  put  a  half  normal  distribution  on  the  discrimination 
parameters  7.  The  entire  prior  specification  is  listed  below. 

Vi  ~  N(0, 1) 

bj ;~N(0,T2) 

N(0,a2)Ib>0].  (4.4.1) 

We  take  r2  =  1000  to  allow  as  vauge  a  prior  for  b  as  possible.  For  the  Bayesian 
analysis,  a  is  taken  as  the  tuning  parameter  and  is  used  to  study  the  sensitivity  of 
the  Bayes  procedure.  The  posterior  means  and  s.d.'s  are  calculated  using  the  Gibbs 
sampler  (Geman  and  Geman,  1984,  Gelfand  and  Smith,  1990).  For  this  example,  the 
number  of  iterates  to  generate  a  sample  is  taken  as  50,  while  the  number  of  samples 
is  taken  as  5000. 

The  usual  competitors  of  the  Bayes  estimates  of  7  and  b  are  the  maximum  like- 
lihood estimates  for  the  mixed  model  treating  B  as  a  random  effect.  The  associated 
standard  errors  for  the  item  parameters  are  based  on  the  asymptotic  distribution  of 
the  marginal  maximum  likelihood  estimators  out  of  bounds.  (MMLE).  A  criticism 
against  maximum  likelihood  approach  is  that,  a  limit  should  be  imposed  on  the  range 
of  values  taken  by  the  discrimination  parameters.  This  is  actually  tantamount  to  the 
specification  of  prior  belief  without  a  Bayesian  justification. 

A  natural  question  in  regard  to  applying  Bayesian  technique  is  the  selection  of  a, 
the  standard  deviation  of  the  distribution  of  the  discrimination  parameters.  If  a  value 
of  a  is  available  from  previous  work,  that  value  can  be  used  in  these  calculations. 
However,  in  order  to  select  a  value  for  er,  a  95%  credibility  interval  for  y  may  be 
helpful.  The  mean  for  the  distribution  of  is  0  and  the  variance  is  a2 /A.  Then  an 
approximate  95%  credibility  interval  for  jj  is  given  by,  ±zQmb  a/2.  Hence,  the  width 
of  the  interval  is,  say  W  =  a  20.025,  where  zQmb  represents  the  upper  0.025  percentage 


98 


point  of  the  iV(0, 1)  distribution.  Thus,  a  =  {W/zQm^).  The  following  table  (Table 
4.1)  provides  some  values  of  a  for  different  values  of  W.  Thus,  with  a  specification  of 
the  width  W  for  an  interval  of  7,,  it  is  possible  to  specify  the  prior  distribution  of  jj. 
Now,  the  information  provided  by  the  credibility  intervals  for  which  W  is  in  (1,  5)  is 
not  too  precise  and  not  too  vague.  Hence,  one  way  to  choose  a  is  the  solution  of  the 
above  equation  with  W  in  (1,  5).  We  used  three  values  of  <r,  namely,  0.5,  1.0  and  1.5. 

Table  4.1  Width  of  the  95%  credibility  interval  for  7j  and  the  corresponding  a 


w 

a 

1 

0.51 

1.5 

0.77 

2 

1.02 

2.5 

1.28 

3 

1.53 

5 

2.55 

10 

5.1 

We  report  the  marginal  maximum  likelihood  estimates  of  b  and  7  and  compare 
them  with  the  corresponding  Bayes  estimates.  The  estimates  for  the  three  link  func- 
tions are  provided  in  Tables  (4.2)-(4.4).  Table  4.2  displays  the  Bayes  estimates  of 
the  item  effects,  with  the  associated  standard  errors  in  parentheses,  for  the  logit  link 
using  a  =  .50,  1.0,  and  1.5.  Tables  4.3  and  4.4  show  the  Bayes  estimates  for  the 
probit  and  log-log  links.  All  the  three  tables  also  provide  the  marginal  MLEs.  The 
marginal  MLEs  are  also  the  posterior  modes  for  the  Bayesian  approach  which  uses 
flat  prior  for  (6,7).  So  not  surprisingly,  some  of  the  marginal  MLEs  for  b  are  very 
close  to  the  Bayes  estimates  and  the  MMLEs  of  7  are  also  somewhat  close  to  the 
Bayes  estimates.  We  also  notice  that  the  Bayes  estimates  tend  to  increase  as  the 
tuning  parameter  a  increases.  In  the  previous  chapters  we  observed  the  same  type  of 
phenomenon.  An  obvious  question  is  how  to  judge  goodness  of  fit  for  this  model.  We 
need  some  measure  to  check  whether  this  model  is  giving  a  better  fit  than  the  simpler 


99 


model,  i.e.,  the  one-parameter  model.  This  issue  of  model  goodness  of  fit,  which  can 
be  a  topic  of  future  research,  was  not  considered  in  this  dissertation. 

Table  4.2  MML  and  Bayes  estimates  for  logit  link 


i 

MMLE 

Bayes  Estimates 

b 

'V 

i 

o  — 

0.5 

o  — 

1.0 

o  - 

1.5 

0 

7 

0 

1 

0 

1 

i 
l 

u.yuu 

1  999 

f)  877 

1  IQfi 
l.lOU 

1  9^4 

0  Q97 

U.i7Z  / 

1  9^6 

(r\  9QA\ 

{U.Zoy) 

{U.ovZ ) 

1 fi  991  \ 

[U.ZZL ) 

(U.oUoJ 

(U.O04J 

yj.Z'il) 

o 
Z 

2  301 

1  104 

J.  .  J-  L/T 

2  223 

Li  ■  •  ) 

0  958 

2  300 

1  062 

2  341 

1  118 

(0.383) 

(0.392) 

(0.323) 

(0.313) 

(0.357) 

(0.356) 

(0.388) 

(0.392) 

3 

2.079 

1.544 

1.903 

1.269 

1.995 

1.409 

2.071 

1.514 

(0.415) 

(0.469) 

(0.326) 

(0.343) 

(0.354) 

(0.386) 

(0.393) 

(0.432) 

4 

1.470 

0.428 

1.481 

0.421 

1.497 

0.447 

1.505 

0.464 

(0.209) 

(0.242) 

(0.204) 

(0.209) 

(0.215) 

(0.229) 

(0.212) 

(0.227) 

5 

-1.272 

1.077 

-1.268 

0.946 

-1.277 

1.027 

-1.287 

1.072 

(0.211) 

(0.339) 

(0.206) 

(0.287) 

(0.211) 

(0.319) 

(0.217) 

(0.340) 

6 

0.469 

1.555 

0.416 

1.345 

0.443 

1.460 

0.461 

1.530 

(0.231) 

(0.436) 

(0.213) 

(0.327) 

(0.223) 

(0.367) 

(0.231) 

(0.417) 

7 

0.181 

0.531 

0.181 

0.529 

0.183 

0.537 

0.186 

0.545 

(0.158) 

(0.212) 

(0.159) 

(0.207) 

(0.157) 

(0.208) 

(0.160) 

(0.212) 

Table  4.3  MML  and  Bayes  estimates  for  probit  link 


i 

MMLE 

Bayes  Estimates 

b 

7 

a  = 

D.276 

a  = 

D.551 

a  =  0.827 

b 

1 

b 

7 

b 

7 

1 

0.604 

0.747 

0.594 

0.852 

0.605 

0.899 

0.602 

0.888 

(0.143) 

(0.202) 

(0.147) 

(0.227) 

(0.151) 

(0.245) 

(0.151) 

(0.251) 

2 

1.321 

0.563 

1.304 

0.632 

1.327 

0.673 

1.338 

0.691 

(0.186) 

(0.201) 

(0.181) 

(0.224) 

(0.162) 

(0.209) 

(0.206) 

(0.263) 

3 

1.270 

0.899 

1.240 

1.004 

1.255 

1.040 

1.306 

1.109 

(0.229) 

(0.259) 

(0.215) 

(0.272) 

(0.177) 

(0.229) 

(0.241) 

(0.305) 

4 

0.912 

0.265 

0.919 

0.327 

0.897 

0.294 

0.922 

0.334 

(0.122) 

(0.143) 

(0.119) 

(0.159) 

(0.120) 

(0.148) 

(0.123) 

(0.166) 

5 

-0.692 

0.606 

-0.713 

0.696 

-0.710 

0.780 

-0.713 

0.719 

(0.112) 

(0.179) 

(0.114) 

(0.195) 

(0.116) 

(0.192) 

(0.113) 

(0.208) 

6 

0.370 

0.958 

0.346 

1.067 

0.358 

1.138 

0.360 

1.127 

(0.145) 

(0.245) 

(0.137) 

(0.239) 

(0.147) 

(0.253) 

(0.146) 

(0.273) 

7 

0.146 

0.353 

0.137 

0.401 

0.136 

0.424 

0.142 

0.423 

(0.101) 

(0.131) 

(0.100) 

(0.150) 

(0.100) 

(0.153) 

(0.103) 

(0.157) 

100 

Table  4.4  MML  and  Bayes  estimates  for  log-log  link 


i 

MMLE 

Bayes  Estimates 

b 

a  =  0.354 

a  =  0.707 

a  = 

L.061 

6 

7 

b 

'7 

b 

1 

1 

1.077 

0.946 

1.054 

0.899 

1.097 

0.978 

1.094 

0.983 

(0.195) 

(0.272) 

(0.181) 

(0.244) 

(0.195) 

(0.279) 

(0.197) 

(0.283) 

2 

2.303 

1.00 

2.235 

0.879 

2.307 

0.975 

2.310 

0.984 

(0.342) 

(0.341) 

(0.293) 

(0.292) 

(0.317) 

(0.322) 

(0.335) 

(0.346) 

3 

2.01 

1.218 

1.917 

1.092 

1.990 

1.180 

2.005 

1.214 

(0.324) 

(0.347) 

(0.271) 

(0.280) 

(0.322) 

(0.343) 

(0.312) 

(0.341) 

4 

1.572 

0.407 

1.589 

0.392 

1.597 

0.409 

1.592 

0.404 

(0.187) 

(0.211) 

(0.183) 

(0.191) 

(0.185) 

(0.201) 

(0.189) 

(0.201) 

5 

-0.380 

0.526 

-0.384 

0.600 

-0.381 

0.559 

-0.388 

0.580 

(0.101) 

(0.167) 

(0.105) 

(0.174) 

(0.104) 

(0.180) 

(0.105) 

(0.195) 

6 

0.702 

0.971 

0.692 

0.961 

0.724 

1.041 

0.735 

1.082 

(0.165) 

(0.278) 

(0.157) 

(0.247) 

(0.171) 

(0.316) 

(0.182) 

(0.355) 

7 

0.501 

0.399 

0.508 

0.400 

0.511 

0.409 

0.508 

0.404 

(0.119) 

(0.149) 

(0.119) 

(0.151) 

(0.121) 

(0.150) 

(0.117) 

(0.153) 

CHAPTER  5 
SUMMARY  AND  FUTURE  RESEARCH 

This  dissertation  primarily  focuses  on  developing  Bayesian  methods  for  one-  and 
two-parameter  item  response  models.  Most  of  the  Bayesian  articles  in  the  literature 
have  so  far  been  link  specific.  We  brought  many  of  the  results  together  under  a 
general  link  function.  One  of  the  main  objectives  was  to  develop  Bayesian  methods 
which  ensured  propriety  of  the  posteriors.  We  discussed  extensively  throughout  this 
dissertation  the  choice  of  priors  that  leads  to  proper  posterior  distributions.  We  have 
provided  a  general  algorithm  to  implement  our  methodology  in  a  given  context. 

In  Chapter  2,  we  developed  a  Bayesian  technique  for  one-parameter  item  response 
models  for  general  link  function.  We  derived  specific  conditions  for  the  posteriors 
to  be  proper  while  using  improper  priors.  We  also  talked  about  marginal  maximum 
likelihood  estimates  for  item  parameters  and  provided  conditions  under  which  they 
are  consistent  estimators.  Also  a  uniform  approximation  of  improper  priors  by  proper 
ones  is  provided  in  this  chapter.  We  seperately  deal  with  Bayesian  estimation  of  item 
parameters  in  matched  pairs  case.  One  important  issue  was  to  see  the  influence  (if 
any)  of  the  main  diagonal  counts  on  the  Bayes  estimates.  We  have  shown  that  unlike 
other  methods,  such  as  conditional  and  marginal  maximum  likelihood,  the  Bayes 
procedure  does  depend  on  the  main  diagonal  counts  for  2  x  2  tables. 

In  Chapter  3  we  developed  a  hierarchical  Bayes  method  for  one-parameter  item 
response  models.  Here  we  have  provided  conditions  which  lead  to  proper  posteriors 
for  item  parameters.  We  discussed  a  general  algorithm  for  implementation  of  the 
procedure  in  any  given  situation. 


101 


102 


In  Chapter  4  we  give  a  classical  Bayesian  methodology  for  two-parameter  item 
response  models.  We  notice  very  carefully  the  problem  of  improper  posteriors  and 
suggested  using  proper  priors  only.  A  general  algorithm  is  also  provided  to  implement 
the  technique. 

Bayes  procedure  has  a  few  advantages  over  the  marginal  maximum  likelihood 
approach.  An  obvious  advantage  is  that,  with  a  hierarchical  prior  one  can  avoid 
choosing  an  arbitrary  value  for  the  scale  parameter  of  the  subject  ability  distribution. 
Another  criticism  against  maximum  likelihood  approach  is  that,  for  two-parameter 
models  a  limit  should  be  imposed  on  the  range  of  values  taken  by  the  discrimination 
parameters.  This  is  actually  tantamount  to  the  specification  of  prior  belief  without  a 
Bayesian  justification. 

In  all  the  chapters  we  use  data  from  a  mathematics  placement  test,  obtained  from 
Professor  James  Albert  of  Bowling  Green  State  University.  Also,  throughout  the 
dissertation,  the  implementation  of  the  Bayes  methodology  has  been  illustrated  by 
adopting  a  Markov  Chain  Monte  Carlo  integration  technique  known  as  the  Gibbs 
sampler.  Using  this  procedure,  the  posterior  density  as  well  as  conditional  mean  and 
variance  can  be  obtained  with  considerable  ease.  Also,  a  special  technique  called 
adaptive  rejection  sampling  has  been  extensively  used  to  generate  samples  from  log- 
concave  densities. 

As  for  future  research,  clearly  there  are  many  important  questions  which  are  left 
unanswered.  One  important  consideration  is  the  choice  of  link  functions.  One  way 
to  answer  this  is  to  find  the  Bayes  factors  of  one  model  with  respect  to  another,  and 
use  the  same  for  model  selection.  Another  important  consideration  is  whether  or  not 
to  use  an  adaptive  link  function,  by  estimating  the  link  from  the  data  as  is  done  in 
Mallick  and  Gelfand  (1994),  and  use  a  nonparametric  procedure  eventually. 

As  mentioned  in  Chapter  4,  an  interesting  topic  of  future  research  could  be  model 
selection  for  a  given  link.  For  example,  it  might  be  important  to  know  when  it  is 


103 


necessary  to  use  a  more  complex  model  with  discrimination  parameters.  Also,  a 
general  model  goodness  of  fit  measure  might  help  deciding  which  model  to  consider 
in  a  given  situation.  This  will  be  a  future  research  topic. 

In  Chapter  2,  Section  3  we  have  proved  a  result  showing  propriety  of  the  posteriors 
using  flat  priors  for  both  0  and  a  for  the  special  case  when  k  =  2.  In  future  one 
should  investigate  whether  this  result  holds  for  arbitrary  k.  In  this  chapter  we  also 
noticed  a  conflicting  behavior  in  Bayes  estimates  for  matched-pairs  case.  One  should 
also  investigate  this  in  further  detail  in  future. 

The  general  techniques  used  in  this  dissertation  have  the  potential  of  being  adapted 
under  more  complex  parametrizations.  A  further  complexity  in  these  models  can  be 
introduced  by  including  a  guessing  parameter.  Such  models,  usually  referred  to  as 
three-parameter  item  response  models,  will  also  be  a  topic  of  future  study. 


BIBLIOGRAPHY 


Albert,  J.H.  (1992).  Bayesian  estimation  of  normal  ogive  item  response  curve  using  Gibbs 
sampling.  Journal  of  Educational  Statistics,  17,  251-269. 

Andersen,  E.B.  (1970).  Asymptotic  properties  of  conditional  maximum  likelihood  estima- 
tors. The  Journal  of  the  Royal  Statistical  Society,  Series  B,  32,  283-301. 

Andersen,  E.B.  (1972).  The  numerical  solution  of  a  set  of  conditional  estimation  equations. 
The  Journal  of  the  Royal  Statistical  Society,  Series  B,  34,  42-54. 

Andersen,  E.B.  (1973).  Conditional  inference  for  multiple-choice  questionnaires.  British 
Journal  of  Mathematical  and  Statistical  Psychology,  26,  31-44. 

Birnbaum,  A.  (1969).  Statistical  theory  for  logistic  mental  test  models  with  a  prior  distri- 
bution of  ability.  Journal  of  Mathematical  Psychology,  6,  258-276. 

Bock,  R.D.  &  Aitkin,  M.  (1981).  Marginal  maximum  likelihood  estimation  of  item  param- 
eters: application  of  an  EM  algorithm.  Psychometrika,  46,  443-459. 

Bock,  R.D.  &  Lieberman,  M.  (1970).  Fitting  a  response  model  for  n  dichotomously  scored 
items.  Psychometrika,  35,  179-197. 

Casella,  G.  &  George,  E.I.  (1992).  Explaining  the  Gibbs  sampler.  The  American  Statisti- 
cian, 46,  167-174. 

Cox,  D.R.  (1958).  The  regression  analysis  of  binary  sequences.  J.  R.  Statist.  Soc,  B,  20, 
215-242. 

Cox,  D.R.  &  Snell,  E.J.  (1989).  The  Analysis  of  Binary  Data.  2nd  edn.,  London:  Chapman 
and  Hall. 

Dawid,  A. P.  (1979).  Conditional  independence  in  statistical  theory  (with  discussion).  J.  R. 
Statist.  Soc,  B,  41,  1-31. 

De  Leeuw,  J.  &  Verhelst,  N.  (1986).  Maximum  likelihood  estimation  in  generalized  Rasch 
models.  Journal  of  Educational  and  Behavioral  Statistics,  11,  183-196. 


104 


105 


Fischer,  G.H.  &  Molenaar,  I.W.  (1995).  Rasch  Models:  Foundations,  Recent  Developments 
and  Applications.  New  york:  Springer- Ver lag. 

Follman,  D.A.  (1988).  Consistent  estimation  in  the  Rasch  model  based  on  nonparametric 
margins.  Psychometrika,  53,  553-562. 

Foutz,  R.V.  (1977).  On  the  unique  consistent  solution  to  the  likelihood  equations.  Journal 
of  the  American  Statistical  Association,  72,  147-148. 

Gelman,  A.  k.  Rubin,  D.B.  (1992).  Inference  from  iterative  simulation  using  multiple  se- 
quences. Statistical  Science,  7,  457-511. 

Ghosh,  M.  (1995).  Inconsistent  maximum  likelihood  estimators  for  the  Rasch  model.  Statis- 
tics and  Probability  Letters,  23,  165-170. 

Gilks,  W.R.  &:  Wild,  P.  (1992).  Adaptive  rejection  sampling  for  Gibbs  sampling.  Applied 
Statistics,  41,  337-348. 

Hambleton,  R.K.  &:  Swaminathan,  H.  (1985).  Item  Response  Theory:  Principles  and 
Applications.  Boston:  Kluwer-Nijhoff, 

Kim,  S.,  Cohen,  A.S.,  Baker,  F.B.,  Subkoviak,  M.J.,  and  Leonard,  T.  (1994).  An  investi- 
gation of  hierarchical  Bayes  procedures  in  item  response  theory.  Psychometrika,  59, 
405-421. 

Leonard,  T.,  and  Novick,  M.R.  (1985).  Bayesian  inference  and  diagnostics  for  the  three 
parameter  logistic  model  (ONR  Technical  Report  85-5).  Iowa  City,  IA:  The  University 
of  Iowa,  CADA  Research  Group. 

Liang,  K.  &;  Zeger,  S.L.  (1988).  On  the  use  of  Concordant  pairs  in  Matched  case-control 
studies.  Biometrics,  44,  1145-1156. 

Lindsay,  B.,  Clogg,  C.C.,  &  Grego,  J.  (1991).  Semiparametric  estimation  in  the  Rasch  model 
and  related  exponential  response  models,  including  a  simple  latent  class  model  for  item 
analysis.  Journal  of  the  American  Statistical  Association,  86,  96-107. 

Lord,  F.M.  (1952).  A  theory  of  test  scores.  Psychometric  monograph,  7. 

Lord,  F.M.  (1953a).  An  application  of  confidence  intervals  and  of  maximum  likelihood  to 
the  estimation  of  an  examinee's  ability.  Psychometrika,  18,  57-75. 

Lord,  F.M.  (1953b).  The  relation  of  test  score  to  the  trait  underlying  the  test.  Educational 
and  Psychological  measurement,  13,  517-548. 

Mallick,  B.K.  k  Gelfand,  A.E.  (1994).  Generalized  linear  models  with  unknown  link  func- 
tions. Biometrika,  81,  237-245. 


106 


Mislevy,  R.J.  k  Bock,  R.D.  (1984).  BILOG  maximum  likelihood  item  analysis  and  test 
scoring:  Logistic  model.  Mooresville,  ID:  Scientific  Software. 

Neuhaus,  J.M.,  Kalbfleisch,  J.D.,  and  Hauck,  W.W.  (1991).  A  comparison  of  cluster-specific 
and  population-averaged  approaches  for  analyzing  correlated  binary  data.  Interna- 
tional Statistical  Review,  59,  25-35. 

Neyman,  J.,  k  Scott,  E.  L.  (1948).  Consistent  estimates  based  on  partially  consistent  ob- 
servations. Econometrika,  16,  1-32. 

O'Hagan,  A.  (1994).  Kendall's  Advanced  Theory  of  Statistics.  Volume  2B.  New  York:  John 
Wiley. 

Owen,  R.  (1975).  A  Bayesian  sequential  procedure  for  quantal  response  in  the  context  of 
adaptive  mental  testing.  Journal  of  the  American  Statistical  Association,  70,  351-356. 

Rasch,  G.  (1960).  Probabilistic  models  for  some  intelligence  and  attainment  tests.  Copen- 
hagen: Danish  Institute  for  Educational  Research. 

Rasch,  G.  (1961).  On  general  laws  and  the  meaning  of  measurement  in  psychology.  Proceed- 
ings of  the  4th  Berkeley  Symposium  on  Mathematical  Statistics.  Berkeley:  University 
of  California  press,  4. 

Richardson,  M.W.  (1936).  The  relationship  between  difficulty  and  differential  validity  of  a 
test.  Psychometrika,  1,  33-49. 

Rigdon,  S.E.  k  Tsutakawa,  R.K.  (1983).  Parameter  estimation  in  latent  trait  models.  Psy- 
chometrika, 48,  567-574. 

Rigdon,  S.E.  k  Tsutakawa,  R.K.  (1987).  Estimation  for  the  Rasch  model  when  both  ability 
and  difficulty  parameters  are  random.  Journal  of  Educational  Statistics,  12,  76-86. 

Sahu,  S.K.  k  Gelfand,  A.E.  (1995).  On  propriety  of  posteriors  and  Bayesian  identifiability 
in  generalized  linear  models.  Preprint. 

Swaminathan,  H.  k  Gifford,  J. A.  (1981).  Estimation  of  parameters  in  the  three-parameter 
latent  trait  model.  Laboratory  of  Psychometric  and  Evaluative  Research  Report  No. 
93.  Amherst,  Mass.:  School  of  Education,  University  of  Massachusetts. 

Swaminathan,  H.  k  Gifford,  J.A.  (1982).  Bayesian  estimation  in  the  Rasch  model.  Journal 
of  Educational  Statistics,  7,  175-192. 

Swaminathan,  H.  k  Gifford,  J.A.  (1985).  Bayesian  estimation  in  the  two-parameter  logistic 
model.  Psychometrika,  50,  349-364. 


107 


Tsutakawa,  R.K.  (1984).  Estimation  of  two-parameter  logistic  item  response  curves.  Jour- 
nal of  Educational  Statistics,  9,  263-276. 

Tsutakawa,  R.K.  &  Lin,  H.Y.  (1986).  Bayesian  estimation  of  item  response  curves.  Psy- 
chometrika, 51,  251-267. 

Tsutakawa,  R.K.  &:  Soltys,  M.J.  (1988).  Approximation  for  Bayesian  ability  estimation. 
Journal  of  Educational  Statistics,  13,  117-130. 

Tsutakawa,  R.K.  k  Johnson,  J.C.  (1990).  The  effect  of  uncertainty  of  item  parameter  esti- 
mation on  ability  estimates.  Psychometrika,  55,  371-390. 


Tucker,  L.R.  (1946).  Maximum  validity  of  a  test  with  equivalent  items.  Psychometrika,  11, 
1-13. 


BIOGRAPHICAL  SKETCH 


Atalanta  Ghosh  was  born  on  July  14,  1964,  in  Calcutta,  West  Bengal,  India.  He 
received  his  Bachelor  of  Science  degree  with  statistics  major  in  1987  from  Asutosh 
College,  Calcutta,  India.  In  1987,  he  joined  University  of  Calcutta  to  pursue  his 
Master  of  Science  in  statistics.  He  received  this  degree  in  1989  from  University  of 
Calcutta,  Calcutta,  India.  In  August  1990  he  joined  the  doctoral  program  in  Statis- 
tics at  the  University  of  Florida,  Gainesville.  He  expects  to  receive  a  Ph.D.  degree  in 
December  1996.  During  his  time  at  the  University  of  Florida,  he  worked  as  a  research 
assistant  to  Dr.  Alan  Agresti  and  Dr.  Malay  Ghosh.  He  was  also  employed  as  a 
teaching  assistant  at  the  Department  of  Statistics  and  taught  some  of  the  undergrad- 
uate courses.  He  worked  as  a  consultant  statistician  at  the  Institute  of  Food  and 
Agrecultural  Sciences,  a  division  of  the  Department  of  Statistics  under  the  supervi- 
sion of  Dr.  Kenneth  M.  Portier.  Upon  graduation,  he  will  join  the  Computer  Task 
Group  Inc.  as  a  consultant  Statistician  and  work  as  a  contractor  to  the  Division  of 
Mathematical  Sciences,  Eli  Lilly,  Indianapolis,  Indiana. 


108 


I  certify  that  I  have  read  this  study  and  that  in  my  opinion  it  conforms  to  accept- 
able standards  of  scholarly  presentation  and  is  fully  adequate,  in  scope  and  quality, 
as  a  dissertation  for  the  degree  of  Doctor  of  Philosophy. 

Alan  Agresti,  Chairman 
Professor  of  Statistics 


I  certify  that  I  have  read  this  study  and  that  in  my  opinion  it  conforms  to  accept- 
able standards  of  scholarly  presentation  and  is  fully  adequate,  in  scope  and  quality, 
as  a  dissertation  for  the  degree  of  Doctor  of  Philosophy. 


Malay  Ghosh, 
Professor  of  Statistics 


I  certify  that  I  have  read  this  study  and  that  in  my  opinion  it  conforms  to  accept- 
able standards  of  scholarly  presentation  and  is  fully  adequate,  in  scope  and  quality, 
as  a  dissertation  for  the  degree  of  Doctor  of  Philosophy. 


Kenneth  M.  Portier 

Associate  Professor  of  Statistics 


I  certify  that  I  have  read  this  study  and  that  in  my  opinion  it  conforms  to  accept- 
able standards  of  scholarly  presentation  and  is  fully  adequate,  in  scope  and  quality, 
as  a  dissertation  for  the  degree  of  Doctor  of  Philosophy. 


\es  J.  Algin 
lessor  of  Fo 


dationsCofJEducation 


This  dissertation  was  submitted  to  the  Graduate  Faculty  of  the  Department  of 
Statistics  in  the  College  of  Liberal  Arts  and  Sciences  and  to  the  Graduate  School  and 
was  accepted  as  partial  fulfillment  of  the  requirements  for  the  degree  of  Doctor  of 
Philosophy. 

December  1996 


Dean,  Graduate  School 


